Despite its compact 3.8-billion-parameter size, this experimental version of Phi-4-Mini achieves reasoning performance on par with or surpassing significantly larger models, including DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Llama-8B.
The goal of the OpenThoughts project is to create open-source datasets for training reasoning models and to create the first model trained on public reasoning data to match DeepSeek-R1-Distill-Qwen-7B on standard reasoning benchmarks such as AIME and LiveCodeBench.
A framework to evaluate sycophantic behavior in ChatGPT-4o, Claude-Sonnet, and Gemini-1.5-Pro across AMPS (mathematics) and MedQuad (medical advice) datasets is introduced.
AlphaProof is presented, an AlphaZero-inspired2 agent that learns to find formal proofs through RL by training on millions of auto-formalized problems, and substantially improves state-of-the-art results on historical mathematics competition problems.
This study studies layer pruning via parameter-efficient finetuning methods, specifically quantization and Low Rank Adapters (QLoRA), such that each of the experiments can be performed on a single 40GB A100 GPU.
The experimental results demonstrate the efficacy of RG in both evaluating and reinforcement learning of reasoning models, and its key innovation is the ability to generate virtually infinite training data with adjustable complexity, unlike most previous reasoning datasets, which are typically fixed.
An extensive systematic and bibliometric literature review on hybrid methods involving optimization and machine learning techniques for clustering and classification aims to identify the potential of methods and algorithms to overcome the difficulties of one or both methodologies when combined.
This review delves into the GWO-related research conducted between 2019 and 2022, encompassing over 200 research articles and explores the growth of GWO in terms of publications, citations, and the domains that leverage its potential.
This review explores the critical role of hyperparameter tuning in ML, detailing its importance, applications, and various optimization techniques, and various tuning methods, including grid search, random search, Bayesian optimization, and meta-learning.
This work identifies systematic issues that have resulted in a distorted playing field in Chatbot Arena and offers actionable recommendations to reform the Chatbot Arena's evaluation framework and promote fairer, more transparent benchmarking for the field.
This tutorial provides an overview of new-field NGAT technology, which shifts from conventional far-field channel models to new near-field channel models, and discusses recent advances in semantic-aware NGAT technologies, which can utilize new metrics for advanced transceiver designs.
It is shown that accuracy generally declines as reasoning chains grow across all models and compute settings, even when controlling for difficulty of the questions, and that while o3-mini (h) achieves a marginal accuracy gain over o3-mini (m), it does so by allocating substantially more reasoning tokens across all problems, even the ones that o3-mini (m) can already solve.
It is shown that if data are replaced, the test error increases with the number of model-fitting iterations, but if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs.
The R package sdmTMB is introduced, which extends the flexible interface familiar to users of lme4, glmmTMB, and mgcv to include spatial and spatiotemporal latent GMRFs using an SPDE-(stochastic partial differential equation) based approach and is hoped to help open this useful class of models to a wider field of geostatistical analysts.
A memory trace (MT) procedure is incorporated in this paper that captures and amalgamates the historical dynamics of the system to evoke the memory effect in detail and claims the existence of memory effects of fractional-order derivatives.
This framework frames dataset selection as an optimization problem that the learning process uses train datapoints to predict on the target tasks, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
It is argued that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.
Investigation of mathematics teachers’ perceptions of implemented AI systems and applications in Abu Dhabi Emirate schools revealed that AI could be used as an educational tool to facilitate teaching and develop students’ performance by including AI systems and applications in the curricula.
This article starts with an overview of orthogonal, physical-layer multicasting, space domain, power domain (PD), rate-splitting, code-domain MAs, MAs in other domains, and random access (RA), and highlights the importance of conducting research in universal MA (UMA) to shrink instead of grow the knowledge tree of MA schemes by providing a unified understanding of MA schemes across all resource dimensions.
This paper theoretically characterize the behavior of AUROC and AUPRC in the presence of model mistakes, establishing clearly that AUPRC is not generally superior in cases of class imbalance and shows that AUPRC can be a harmful metric.
This letter addresses, for the first time, the uplink performance optimization of multi-user pinching-antenna (PA) systems, recently developed for next-generation wireless networks, and proposes an effective approach that separately optimizes the positions of the PAs and the resource allocation.
FGN is presented, a simple, scalable and flexible modeling approach which significantly outperforms the current state-of-the-art models and produces state-of-the-art ensemble forecasts as measured by a range of deterministic and probabilistic metrics.
This study conducts a systematic literature review (SLR) to investigate the applications and trends of AI in mathematics education by examining articles published in reputable journals indexed in Web of Science and Scopus.
A comprehensive review of GNNs in epidemic tasks and methodologies is furnished, and hierarchical taxonomies for both epidemic tasks and methodologies are introduced, offering a trajectory of development within this domain.
This work demonstrates the approach for graphing distributions of covariance matrices on several models, including the Wishart, inverse-Wishart, and scaled inverse-Wishart families in different dimensions using a tableau of low-dimensional displays.
For production use cases, a new distillation engine is introduced that converts TabPFN-2.5 into a compact MLP or tree ensemble, preserving most of its accuracy while delivering orders-of-magnitude lower latency and plug-and-play deployment.
The Fractional Kolmogorov-Arnold Network (fKAN), a novel neural network architecture that incorporates the distinctive attributes of KANs with a trainable adaptive fractional-orthogonal Jacobi function as its basis function, is presented.
This paper focuses on the estimation of the Gaussian process covariance parameters by reviewing recent works on the analysis of the advantages and disadvantages of usual estimation methods, the most relevant validation criteria (for detecting poor estimation) and recent robust and corrective methods.
This survey aims to provide a comprehensive understanding of graph reduction methods, including graph sparsification, graph coarsening, and graph condensation, by establishing a unified definition for these methods and introducing a hierarchical taxonomy to categorize the challenges they address.
This work presents soft inductive biases as a key unifying principle in explaining these phenomena: rather than restricting the hypothesis space to avoid overfitting, embrace a flexible hypothesis space, with a soft preference for simpler solutions that are consistent with the data.
To ensure the exponentially mean-square stability of the delayed neural networks, the article constructs a Lyapunov-Krasovskii functional (LKF) that includes information about the bounds of the delay that is derived in the form of linear matrix inequalities by employing modified free matrix-based integral inequalities.
An analysis of the dynamical mechanism underlying memorization is presented, highlighting the need for regularization to avoid reproducing the analytically tractable minimizer; and laying the foundations for a principled understanding of how to regularize.
The successful application of SciML to the simulation of the human cardiac function, a field of significant socioeconomic importance that poses numerous challenges on both the mathematical and computational fronts.
Its computational efficiency, medium-range probabilistic skill, spectral fidelity, and rollout stability at subseasonal timescales make it a strong candidate for improving meteorological forecasting and early warning systems through large ensemble predictions.
This paper introduces a new class of cooperative games that arise from production-inventory problems, where several agents have to cover their demand over a finite time horizon and shortages are allowed.
This research provides a mathematical framework to analyze multiplayer games with an arbitrary number of strategies on regular graphs by drawing an analogy with the Balls-and-Boxes problem, based on which it is shown that the local configuration of multiplayer games on graphs is equivalent to distributing k identical co-players among n distinct strategies.
This paper presents a comprehensive survey of 123 distinct variants of MOCSAs published in scientific journals and provides future research directions for MOCSA.
Article Galaxy Pages is a free service from Research Solutions, a company that offers access to content in collaboration with publishing partners, online repositories and discovery services.