We propose a novel method for efficient target audience augmentation in programmatic digital advertising. This method utilizes a novel ParGenFS algorithm for most adequate generalization in taxonomies which was developed by the authors in a joint work. The ParGenFS extends user segments by parsimoniously lifting them off-line as a fuzzy set over IAB content taxonomy into a higher rank ‘head subject’. This algorithm was initially intended as an intelligent information retrieval tool. Here it is applied to a very different task of targeted advertisement as an effective tool for augmenting audiences.
Usually DEA methods are used for the assessment of the regions disaster vulnerability. However, most of these methods work with precise values of all the characteristics of the regions. At the same time, in real life, quite often most of the data consists of expert estimates or approximate values. In this regard, we propose to use modified DEA methods, which will take into account inaccuracy of the data. We apply these methods to the evaluation of wildfire preventive measures in regions of the Russian Federation.
This paper presents a relatively rare case of an optimization problem in data analysis to admit a globally optimal solution by a recursive algorithm. We are concerned with finding a most specific generalization of a fuzzy set of topics assigned to leaves of domain taxonomy represented by a rooted tree. The idea is to “lift” the set to its “head subject” in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors, either “gaps” or “offshoots” or both. Our method globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. We apply this to a collection of 17645 research papers on Data Science published in 17 Springer journals for the past 20 years. We extract a taxonomy of Data Science (TDS) from the international Association for Computing Machinery Computing Classification System 2012. We find fuzzy clusters of leaf topics over the text collection, optimally lift them to head subjects in TDS, and comment on the tendencies of current research following from the lifting results.
This paper proposes a novel method, referred to as ParGenFS, for finding a most specific generalization of a query set represented by a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. The query set is generalized by “lifting” it to one or more “head subjects” in the higher ranks of the taxonomy. The head subjects should cover the query set, with the possible addition of some “gaps”, taxonomy nodes covered by the head subject but irrelevant to the query set. To decrease the numbers of gaps, we admit some “offshoots”, nodes belonging to the query set but not covered by the head subject. The method globally minimizes the total number of head subjects, gaps and offshoots, each suitably weighted. Our algorithm is applied to the structural analysis and description of a collection of 17685 abstracts of research papers published in 17 Springer journals related to Data Science for the 20-year period 1998-2017. Our taxonomy of Data Science (TDS) is extracted from the Association for Computing Machinery Computing Classification System 2012 (ACM-CCS), a six-level hierarchical taxonomy manually developed by a team of ACM experts. The TDS also includes a number of additional leaves that we added to cater for recent developments not represented in the ACM-CCS taxonomy. We find fuzzy clusters of leaf topics over the text collection, using specially developed machinery. Three of the clusters are indeed thematic, relating to the Data Science sub-areas of (a) learning, (b) information retrieval, and (c) clustering. These three clusters are then lifted in the TDS using ParGenFS, which allows us to draw some conclusions about tendencies in developments in these areas.
As a result of the climate change the situation in Arctic area leads to several important consequences. On the one hand, fossil fuels can be exploited much easier than before. On the other hand, their excavation leads to serious potential threats to fishing by changing natural habitats which in turn creates serious damage to the countries’ economies. Another set of problems arises due to the extension of navigable season for shipping routes. Thus, there are already discussions on how should resources be allocated among countries. In Aleskerov and Victorova (An analysis of potential conflict zones in the Arctic Region, HSE Publishing House, Moscow, 2015) a model was presented analyzing preferences of the countries interested in natural resources and revealing potential conflicts among them. We present several areas allocation models based on different preferences over resources among interested countries. As a result, we constructed several allocations where areas are assigned to countries with respect to the distance or the total interest, or according to the procedure which is counterpart of the Adjusted Winner procedure. We consider this work as an attempt to help decision-making authorities in their complex work on adjusting preferences and conducting negotiations in the Arctic zone. We would like to emphasize that these models can be easily extended to larger number of parameters, to the case when some areas for some reasons should be excluded from consideration, to the case with ‘weighted’ preferences with respect to some parameters. And we strongly believe that such models and evaluations based on them can be helpful for the process of corresponding decision making.
We give a survey of approaches for analyzing the sensitivity of non-dominated alternatives to changes in the parameters of partial quasi-orderings that define preferences. Such parameters can include values of importance coefficients for different criteria or boundaries of interval estimates of the degrees of superiority in the importance of some criteria over others, boundaries of intervals of criteria value tradeoffs uncertainty, and others.
The paper develops a new extension of the sequential preference condition, which leads to unique stable matching in all subpopulations, obtained by consistent restrictions of the marriage matching problem. Under the new condition, the Gale–Shapley algorithm is stable, consistent, strategy-proof, Pareto optimal for men, and Pareto optimal for women.
The paper shows the possibility of applying the tool of non-additive measures and the belief functions theory to solving a number of problems of significance analysis and conflict of the political party positions. The study was performed on a database of online polls of parties in Germany before the elections to the Bundestag in 2013 and the results of these elections. The possibility of finding the most significant groups of issues for voting, evaluating the political heterogeneity of society, assessing the importance of the positions of individual parties for voting, assessing the conflict of the party positions on important issues is shown.
The possibility of using the belief function theory for developing of trading strategies is considered in this paper. An analysis of this approach is given on the data of the Russian stock market. The belief and plausibility functions (and their corresponding bodies of evidence) to the system’s recommendations (buy, sell or hold) are calculated using fuzzy inference methods for technical indicators. Further, these bodies of evidence are aggregated using the combining rules (Dempster’s rule, Yager’s rule and others). The discount coefficients of the bodies of evidence are calculated at the stage of the learning under the condition of maximizing the profitability of the trading strategy. The intervals for the buying or selling of assets are determined on the results of such combination. The decision about the corresponding action is taken after comparing these intervals. The study showed that the proposed approach provides an interesting result.
In real applications, sometimes it is necessary to evaluate inner or external conflict of pieces of evidence. However, these numerical values cannot give us explanations why this conflict occurs. Thus, we need deeper analysis of available information. In the paper, we propose the clusterization of a given evidence on pieces of evidence in a way that we try to achieve the highest conflict among pieces of evidence and the smallest inner conict within pieces of evidences based on several functionals that help us to evaluate inner and external conflict.
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors referred to as “gaps” and “offshoots”. Our method, ParGenFS, globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. Two applications are considered: (1) analysis of tendencies of research in Data Science; (2) audience extending for programmatic targeted advertising online. The former involves a taxonomy of Data Science derived from the celebrated ACM Computing Classification System 2012. Based on a collection of research papers published by Springer 1998–2017, and applying in-house methods for text analysis
retrieval and clustering. The head subjects of these clusters inform us of some general tendencies of the research. The latter involves publicly available IAB Tech Lab Content Taxonomy. Each of about 25 mln users is assigned with a fuzzy profile within this taxonomy, which is generalized offline using ParGenFS. Our experiments show that these head subjects effectively extend the size of targeted audiences at least twice without loosing quality.
Explicit two-level in time and symmetric in space finite-difference schemes constructed by approximating the 1D barotropic quasi-gas-/quasi-hydrodynamic systems of equations are studied. The schemes are linearized about a constant solution with a nonzero velocity, and, for them, necessary and sufficient conditions for the L2-dissipativity of solutions to the Cauchy problem are derived depending on the Mach number. These conditions differ from one another by at most twice. The results substantially develop the ones known for the linearized Lax–Wendroff scheme. Numerical experiments are performed to analyze the applicability of the found conditions in the nonlinear formulation to several schemes for different Mach numbers.
This book concentrates on in-depth explanation of a few methods to address core issues, rather than presentation of a multitude of methods that are popular among the scientists. An added value of this edition is that I am trying to address two features of the brave new world that materialized after the first edition was written in 2010. These features are the emergence of “Data science” and changes in student cognitive skills in the process of global digitalization. The birth of Data science gives me more opportunities in delineating the field of data analysis. An overwhelming majority of both theoreticians and practition-ers are inclined to consider the notions of ‘data analysis” (DA) and “machine learning” (ML) as synonymous. There are, however, at least two differences between the two. First comes the difference in perspectives. ML is to equip computers with methods and rules to see through regularities of the environment - and behave accordingly. DA is to enhance conceptual understanding. These goals are not inconsistent indeed, which explains a huge overlap between DA and ML. However, there are situations in which these perspectives are not consistent. Regarding the current students’ cognitive habits, I came to the conclusion that they prefer to immediately get into the “thick of it”. Therefore, I streamlined the presentation of multidimensional methods. These methods are now organized in four Chapters, one of which presents correlation learning (Chapter 3). Three other Chapters present summarization methods both quantitative (Chapter 2) and categorical (Chapters 4 and 5). Chapter 4 relates to finding and characterizing partitions by using K-means clustering and its extensions. Chapter 5 relates to hierarchical and separative cluster structures. Using encoder-decoder data recovery approach brings forth a number of mathematically proven interrelations between methods that are used for addressing such practical issues as the analysis of mixed scale data, data standardization, the number of clusters, cluster interpretation, etc. An obvious bias towards summarization against correlation can be explained, first, by the fact that most texts in the field are biased in the opposite direction, and, second, by my personal preferences. Categorical summarization, that is, clustering is considered not just a method of DA but rather a model of classification as a concept in knowledge engineering. Also, in this edition, I somewhat relaxed the “presentation/formulation/computation” narrative struc-ture, which was omnipresent in the first edition, to be able do things in one go. Chapter 1 presents the author’s view on the DA mainstream, or core, as well as on a few Data science issues in general. Specifically, I bring forward novel material on the role of DA, including its successes and pitfalls (Section 1.4), and classification as a special form of knowledge (Section 1.5). Overall, my goal is to show the reader that Data science is not a well-formed part of knowledge yet but rather a piece of science-in-the-making.
Ranking is an important part of several areas of contemporary research, including social sciences, decision theory, data analysis and information retrieval. The goal of this project is to align developments in quantitative social sciences and decision theory with the current thought in computer science, including a few novel results. Specifically, we consider binary preference relations, the so-called weak orders that are in one-to-one correspondence with rankings. We show that the conventional symmetric difference distance between weak orders, considered as sets of ordered pairs, coincides with the celebrated Kemeny distance between the corresponding rankings, despite the seemingly much simpler structure of the former. Based on this, we review several properties of the geometric space of weak orders involving the ternary relation “between”, and contingency tables for cross-partitions. Next we reformulate the consensus ranking problem as a variant of finding an optimal linear ordering, given a correspondingly defined consensus matrix. The difference is in a subtracted term, the partition concentration, that depends only on the distribution of the objects in the individual parts. We apply our results to the conventional Likert scale to show that the Kemeny consensus rule is rather insensitive to the data under consideration and, therefore, should be supplemented with more sensitive consensus schemes.
A decision-support tool for estimating the volume of investment in developing a regional energy/freight transportation infrastructure is proposed. The tool provides the estimates of the required investment volume and those of the expected amount of revenue that the infrastructure functioning may generate. These estimates are key ones in negotiations with private investors on forming a potential public–private partnership to finance the infrastructure development. The tool includes (a) a mathematical model underlying the formulations of three optimization problems on its basis depending on the information available to the decision-makers—two mixed programming problems and a minimax problem, which is proven to be reducible to a mixed programming one with all integer variables being Boolean, (b) a standard software package for solving mixed programming problems, and (c) a software package for processing data. The results of testing the proposed tool on sets of model data taken from open sources are discussed.
The problem of developing a chain of charging stations for electric vehicles along a highway crossing a geographic region is considered. A tool for determining an optimal structure of this chain is proposed. The use of the tool, particularly, allows one to estimate the cost of (and thus the needed volume of investment for) developing the chain proceeding from a) the demand for electricity in the chain, b) the existing technological and legal requirements to the structure of such a chain, c) the expected production capacities of all the types of renewable sources of energy, which can effectively be deployed at each charging station from the chain, and d) the cost of the equipment to be acquired and installed at each charging station to provide the chain customers with electricity to be received by each charging station in the chain from both electrical grids and renewable sources of energy, the cost of maintaining this equipment, and the cost of operating it. The problem under consideration is formulated as a nonlinear mixed programming one of maximizing the minimum function of a sum of several linear and two bilinear functions of vector arguments. It is proven that under certain natural and verifiable assumptions, finding solutions to this problem turns out to be reducible to solving either a mixed programming problem with linear constraints or a linear programming problem and an integer programming one. For a set of model data, an illustrative example of formulating and solving the problem under consideration is provided, and the way to use the tool in negotiations with potential investors in the project is discussed.
We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both “gaps” and “offshoots”. The method globally minimizes a penalty combining head subjects and gaps and offshoots. We apply this to extract research tendencies from a collection of about 18000 research papers published in Springer journals on data science. We consider a taxonomy of Data Science based on the Association for Computing Machinery Classification of Computing System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection and use thematic clusters’ head subjects to make some comments on the tendencies of research.
The paper is devoted to a cosmonaut training planning problem, which is some kind of resource-constrained project scheduling problem (RCPSP) with a new goal function. Training of each cosmonaut is divided into special courses. To avoid too sparse courses, we introduce a special objective function—the weighted total sparsity of training courses. This non-regular objective function requires the development of new methods that differ from methods for solving the thoroughly studied RCPSP with the makespan criterion. New heuristic algorithms for solving this problem are proposed. Their efficiency is verified on real-life data. In a reasonable time, the algorithms let us find a solution that is better than the solution found with the help of the solver CPLEX CP Optimizer.
The paper studies group-separable preference profiles. Such a profile is group-separable if for each subset of alternatives there is a partition in two parts such that each voter prefers each alternative in one part to each alternative in the other part. We develop a parenthesization representation of group-separable domain. The precise formula for the number of group-separable preference profiles is obtained. The recursive formula for the number of narcissistic group-separable preference profiles is obtained. Such a profile is narcissistic group-separable if it is group-separable and each alternative is preferred the most by exactly one voter.
We develop a model of pork-barrel politics in which a government official tries to improve her reelection chances by spending on targeted interest groups. The spending signals that she shares their concerns. We investigate the effect of such pandering on public spending. Pandering increases spending relative to a non-accountable official (one who does not have to run for reelection) if either the official's overall spending propensity is known, or if it is unknown but the effect of spending on the deficit is opaque to voters. By contrast, an unknown spending propensity may induce the elected official to exhibit fiscal discipline if spending is transparent.