Browsing by Author "Rojo, Javier"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Density of rational points on K3 surfaces over function fields(2012-09-05) Li, Zhiyuan; Hassett, Brendan E.; Wolf, Michael; Rojo, JavierIn this paper, we study sections of a Calabi-Yau threefold fibered over a curve by K3 surfaces. We show that there exist infinitely many isolated sections on certain K3 fibered Calabi-Yau threefolds and the subgroup of the N´eron-Severi group generated by these sections is not finitely generated. This also gives examples of K3 surfaces over the function field F of a complex curve with Zariski dense F-rational points, whose geometric models are Calabi-Yau. Furthermore, we also generalize our results to the cases of families of higher dimensional Calabi-Yau varieties with Calabi-Yau ambient spaces.Item Dimension reduction methods with applications to high dimensional data with a censored response(2010) Nguyen, Tuan S.; Rojo, JavierDimension reduction methods have come to the forefront of many applications where the number of covariates, p, far exceed the sample size, N. For example, in survival analysis studies using microarray gene expression data, 10--30K expressions per patient are collected, but only a few hundred patients are available for the study. The focus of this work is on linear dimension reduction methods. Attention is given to the dimension reduction method of Random Projection (RP), in which the original p-dimensional data matrix X is projected onto a k-dimensional subspace using a random matrix Gamma. The motivation of RP is the Johnson-Lindenstrauss (JL) Lemma, which states that a set of N points in p-dimensional Euclidean space can be projected onto a k ≥ 24lnN3e2-2e 3 dimensional Euclidean space such that the pairwise distances between the points are preserved within a factor 1 +/- epsilon. In this work, the JL Lemma is revisited when the random matrix Gamma is defined as standard Gaussian and Achlioptas-typed. An improvement on the lower bound for k is provided by working directly with the distributions of the random distances rather than resorting to the moment generating function technique used in the literature. An improvement on the lower bound for k is also provided when using pairwise L2 distances in the space of the original points and pairwise L 1 distances in the space of the projected points. Another popular dimension reduction method is Partial Least Squares. In this work, a variant of Partial Least Squares is proposed, denoted by Rank-based Modified Partial Least Squares (RMPLS). The weight vectors of RMPLS can be seen to be the solution to an optimization problem. The method is insensitive to outlying values of both the response and the covariates, and takes into account the censoring information in the construction of its weight vectors. Results from simulation and real datasets under the Cox and Accelerated Failure Time (AFT) models indicate that RMPLS outperforms other leading methods for various measures when outliers are present in the response, and is comparable to other methods in the absence of outliers in the response.Item Heavy-tailed densities(Wiley, 2013) Rojo, JavierThe concept of heavy- or long-tailed densities (or distributions) has attracted much well-deserved attention in the literature. A quick search in Google using the keywords long-tailed statistics retrieves almost 12 million items. The concept has become a pillar of the theory of extremes, and through its connection with outlier-prone distributions, long-tailed distributions also play a central role in the theory of robustness. The concept of tail heaviness is by now ubiquitous, appearing in a diverse set of disciplines that includes: economics, communications, atmospheric sciences, climate modeling, social sciences, physics, modeling of complex systems, etc. Nevertheless, the precise meaning of ‘long-’ or ‘heavy tails’ remains somewhat elusive. Thus, in a substantial portion of the early literature, long-tailednessmeant that the underlying distributionwas capable of producing anomalous observations in the sense that they were ‘too far’ from themain body of observations. Implicit in these informal definitions was the notion that any distribution that behaved that way had to do so because its tails were longer than those of the normal distribution. This paper discusses tail orderings and several approaches for the classification of probability distributions according to tail heaviness. It is concluded that an approach based on the limiting behavior of the residual life function, and its corresponding characterizations based on functions of regular variation and asymptotic distribution of extreme spacings, provides the more natural and illuminating concepts of tail behavior.Item Minimum Distance Estimation in Categorical Conditional Independence Models(2012) Kahle, David John; Rojo, JavierOne of the oldest and most fundamental problems in statistics is the analysis of cross-classified data called contingency tables. Analyzing contingency tables is typically a question of association - do the variables represented in the table exhibit special dependencies or lack thereof? The statistical models which best capture these experimental notions of dependence are the categorical conditional independence models; however, until recent discoveries concerning the strongly algebraic nature of the conditional independence models surfaced, the models were widely overlooked due to their unwieldy implicit description. Apart from the inferential question above, this thesis asks the more basic question - suppose such an experimental model of association is known, how can one incorporate this information into the estimation of the joint distribution of the table? In the traditional parametric setting several estimation paradigms have been developed over the past century; however, traditional results are not applicable to arbitrary categorical conditional independence models due to their implicit nature. After laying out the framework for conditional independence and algebraic statistical models, we consider three aspects of estimation in the models using the minimum Euclidean (L2E), minimum Pearson chi-squared, and minimum Neyman modified chi-squared distance paradigms as well as the more ubiquitous maximum likelihood approach (MLE). First, we consider the theoretical properties of the estimators and demonstrate that under general conditions the estimators exist and are asymptotically normal. For small samples, we present the results of large scale simulations to address the estimators' bias and mean squared error (in the Euclidean and Frobenius norms, respectively). Second, we identify the computation of such estimators as an optimization problem and, for the case of the L2E, propose two different methods by which the problem can be solved, one algebraic and one numerical. Finally, we present an R implementation via two novel packages, mpoly for symbolic computing with multivariate polynomials and catcim for fitting categorical conditional independence models. It is found that in general minimum distance estimators in categorical conditional independence models behave as they do in the more traditional parametric setting and can be computed in many practical situations with the implementation provided.Item Nonparametric estimation of bivariate mean residual life function(2005) Ghebremichael, Musie S.; Rojo, JavierIn survival analysis the additional lifetime that an object survives past a time t is called the residual life function of the object. Mathematically speaking if the lifetime of the object is described by a random variable T then the random variable R(t) = [T - t| T > t] is called the residual life random variable. The quantity e(t) = E( R(t)) = E[T - t|T > t] is called the mean residual lifetime (mrl) function or the life expectancy at age t. There are numerous situations where the bivariate mrl function is important. Times to death or times to initial contraction of a disease may be of interest for litter mate pairs of rats or for twin studies in humans. The time to a deterioration level or the time to reaction of a treatment may be of interest in pairs of lungs, kidneys, breasts, eyes or ears of humans. In reliability, the distribution of the lifelengths of a particular pair of components in a system may be of interest. Because of the dependence among the event times, we can not get reliable results by using the univariate mrl function on each event times in order to study the aging process. The bivariate mrl function is useful in analyzing the joint distribution of two event times where these times are dependent. In recent years, though considerable attention has been paid to the univariate mrl function, relatively little research has been devoted to the analysis of the bivariate mrl function. The specific contribution of this dissertation consists in proposing, and examining the properties of, nonparametric estimators of the bivariate mean residual life function when a certain order among such functions exists. That is, we consider the problem of nonparametric estimation of a bivariate mrl function when it is bounded from above by another known or unknown mrl function. The estimators under such an order constraint are shown to perform better than the empirical mrl function in terms of mean squared error. Moreover, they are shown to be projections, onto an appropriate space, of the empirical mean residual life function. Under suitable technical conditions, the asymptotic theory of these estimators is derived. Finally, the procedures are applied to a data set on bivariate survival. More specifically, we have used the Diabetic Retinopathy Study (DRS) data to illustrate our estimators. In this data set, the survival times of both left and right eyes are given for two groups of patients: juvenile and adult diabetics. Thus, it seems natural to assume that the mrl for the juvenile diabetics be longer than the mrl of the adult diabetics. Under this assumption, we calculated the estimators of the mrl function for each group. We have also calculated the empirical mrl functions of the two groups and compared them with the estimators of the mrl function obtained under the above assumption.Item On the operating characteristics of some non-parametric methodologies for the classification of distributions by tail behavior(2005) Ott, Richard Charles; Rojo, JavierNew methods for classifying tails of probability distributions based on data are proposed. Some methods apply the nonparametric theories of Rojo [35] and Schuster [36] and differ from classical extreme value theory and other well established methods. All the methods implement the extreme spacing of the data, the difference of the largest and second largest values. The results are then compared based on power properties to the classical technique of a Points Over Threshold model based on the Generalized Pareto Distribution (GPD). The following topics are the foundation of this thesis: Chapter 1. Review of classical extreme value theory and discussion on the class of medium-tailed distributions. Chapter 2. Review of the tail classification schemes of Parzen, Schuster, and Rojo upon which the latter two suggest the usage of the Extreme Spacing (ES) as a possible classifying instrument. Additional subcategorizations are also provided for the schemes of Schuster and Rojo. Chapter 3. Review of estimation methods for the Points Over Threshold GPD parameters for classification purposes. A Monte Carlo study classifying tails of many common distributions using the GPD by way of maximum likelihood is also provided. Chapter 4. Three classification tests based on the ES are provided. The first is a test to decide whether a sample originates from a completely specified distribution such as Exp(1). The second classifies whether data originated from an exponential distribution with unknown parameter. The third classifies an underlying distribution as short-, medium-, or long-tailed. Also discussed, is the potential benefit of blocking the data before applying the above mentioned tests. Chapter 5. Classifying specific data sets by way of the new methods. Some of the new ES methods may be applicable to the data when classical methods are inapplicable, for example when the GPD maximum likelihood numerical algorithm does not converge to yield a shape parameter estimate or when the variance of the shape parameter cannot be estimated since the parameter estimate is close to a parameter space endpoint. Even when classical methods are applicable, these tests can give a more thorough understanding of the tail behavior of the underlying distribution.Item RELIABILITY AND RISK ASSESSMENT OF NETWORKED URBAN INFRASTRUCTURE SYSTEMS UNDER NATURAL HAZARDS(2013-09-16) Rokneddin, Keivan; Dueñas-Osorio, Leonardo; Padgett, Jamie E.; Rojo, Javier; Wickham, HadleyModern societies increasingly depend on the reliable functioning of urban infrastructure systems in the aftermath of natural disasters such as hurricane and earthquake events. Apart from a sizable capital for maintenance and expansion, the reliable performance of infrastructure systems under extreme hazards also requires strategic planning and effective resource assignment. Hence, efficient system reliability and risk assessment methods are needed to provide insights to system stakeholders to understand infrastructure performance under different hazard scenarios and accordingly make informed decisions in response to them. Moreover, efficient assignment of limited financial and human resources for maintenance and retrofit actions requires new methods to identify critical system components under extreme events. Infrastructure systems such as highway bridge networks are spatially distributed systems with many linked components. Therefore, network models describing them as mathematical graphs with nodes and links naturally apply to study their performance. Owing to their complex topology, general system reliability methods are ineffective to evaluate the reliability of large infrastructure systems. This research develops computationally efficient methods such as a modified Markov Chain Monte Carlo simulations algorithm for network reliability, and proposes a network reliability framework (BRAN: Bridge Reliability Assessment in Networks) that is applicable to large and complex highway bridge systems. Since the response of system components to hazard scenario events are often correlated, the BRAN framework enables accounting for correlated component failure probabilities stemming from different correlation sources. Failure correlations from non-hazard sources are particularly emphasized, as they potentially have a significant impact on network reliability estimates, and yet they have often been ignored or only partially considered in the literature of infrastructure system reliability. The developed network reliability framework is also used for probabilistic risk assessment, where network reliability is assigned as the network performance metric. Risk analysis studies may require prohibitively large number of simulations for large and complex infrastructure systems, as they involve evaluating the network reliability for multiple hazard scenarios. This thesis addresses this challenge by developing network surrogate models by statistical learning tools such as random forests. The surrogate models can replace network reliability simulations in a risk analysis framework, and significantly reduce computation times. Therefore, the proposed approach provides an alternative to the established methods to enhance the computational efficiency of risk assessments, by developing a surrogate model of the complex system at hand rather than reducing the number of analyzed hazard scenarios by either hazard consistent scenario generation or importance sampling. Nevertheless, the application of surrogate models can be combined with scenario reduction methods to improve even further the analysis efficiency. To address the problem of prioritizing system components for maintenance and retrofit actions, two advanced metrics are developed in this research to rank the criticality of system components. Both developed metrics combine system component fragilities with the topological characteristics of the network, and provide rankings which are either conditioned on specific hazard scenarios or probabilistic, based on the preference of infrastructure system stakeholders. Nevertheless, they both offer enhanced efficiency and practical applicability compared to the existing methods. The developed frameworks for network reliability evaluation, risk assessment, and component prioritization are intended to address important gaps in the state-of-the-art management and planning for infrastructure systems under natural hazards. Their application can enhance public safety by informing the decision making process for expansion, maintenance, and retrofit actions for infrastructure systems.