NP -hard scheduling problems with the criterion of minimizing the maximum penalty, e.g. maximum lateness, are considered. For such problems, a metric which delivers an upper bound on the absolute error of the objective function value is introduced. Taking the given instance of some problem and using the introduced metric, the nearest instance is deter- mined for which a polynomial or pseudo-polynomial algorithm is known. A schedule is constructed for this determined instance which is then applied to the original instance. It is shown how this approach can be applied to different scheduling problems.
This work is devoted to the methodology for identifying structurally close objects of the type “country_year” based on a system of indicators characterizing the state capacity 1996–2015. A comparison of clustering methods (including hierarchical clustering) with methods of analyzing patterns based on a pairwise comparison of indicators, ordinal-fixed and ordinal-invariant pattern clustering, is proposed. The possibility of sharing the methods of clustering and pattern analysis to obtain interpretable results from the point of view of political science is demonstrated. Groups of countries with similar development paths by reference years on the basis of a dynamic analysis of patterns are identified. The dynamic change in state capacity (from the point of view of the selected indicator system) of 166 countries of the world is determined.
The Basel III regulation raised the minimum capital requirements for banks. However, its implementation may not have reduced systemic risks. Academicians investigating optimal banking regulations do not have a consensus on whether to increase or decrease capital requirements. Here, we use the agent-based approach to study capital regulation and its implications on the evolution of the banking system. We chose the Russian banking system to proxy key model parameters. We find that lower capital requirements imply higher financial stability than the Basel III regime, where the regulator requires banks to have capital over 10% of its risk-weighted assets’ amount. However, the regulatory rule to merely have a non-negative capital is the simplest solution that best fits heterogeneous economies. It produces the highest ratio of capital to assets, the least number of bank bankruptcies, and the lowest demand of banks to enter the interbank market to cover liquidity problems for all systems.
We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both “gaps” and “offshoots”. Our method involves two more automated analysis techniques: a fuzzy clustering method, FADDIS, involving both additive and spectral properties, and a purely structural string-to-text relevance measure based on suffix trees annotated by frequencies. We apply this to extract research tendencies from two collections of research papers: (a) about 18000 research papers published in Springer journals on data science for 20 years, and (b) about 27000 research papers retrieved from Springer and Elsevier journals in response to data science related queries. We consider a taxonomy of Data Science based on the Association for Computing Machinery Classification of Computing System (ACM-CCS 2012). Our findings allow us to make some comments on the tendencies of research that cannot be derived by using more conventional techniques.
We propose a novel method for efficient target audience augmentation in programmatic digital advertising. This method utilizes a novel ParGenFS algorithm for most adequate generalization in taxonomies which was developed by the authors in a joint work. The ParGenFS extends user segments by parsimoniously lifting them off-line as a fuzzy set over IAB content taxonomy into a higher rank ‘head subject’. This algorithm was initially intended as an intelligent information retrieval tool. Here it is applied to a very different task of targeted advertisement as an effective tool for augmenting audiences.
As a result of the global warming, the situation in the Barents Sea leads to several important consequences. Firstly, oil and gas drilling becomes much easier than before. Therefore, it may raise the level of discussions on disputed shelf zones where the deposits are located, especially near to Norway-Russia sea border. Secondly, oil and gas excavation leads to potential threats to fishing by changing natural habitats, which in turn can create serious damage to the economies.
We construct a model, which helps to highlight potential disputed territories and analyze preferences of the countries interested in fossil fuels and fish resources. We also compare different scenarios of resource allocation with allocation by current agreement.
Recently appeared online courses rapidly gained their popularity due to the great opportunities. Absolutely different people can study any discipline for various purposes. Online courses can be useful both to children in preparing for lessons, and to adults in advanced training. Gradually, courses are becoming not only part of the additional curriculum at the university, but part of the mandatory program, too. However, not everyone supports the new way of education. Therefore, the goal of this work was to identify students' attitudes towards online education, the reasons for their preferences on online format of education and the willingness to replace traditional lectures into an online format. The study was carried out on the basis of a survey of more than 6,000 students as part of the Student Life Survey conducted every year at the HSE. The analysis was made by using various clustering methods, such as hierarchical clustering, clustering using the K-means method and analysis of latent classes, as well as analysis of variance. The students were divided into 6 clusters based on the different attitude towards the replacement of all lectures to the online format: devotees of HSE, amateurs of online courses, disciplined, social, learners for the grades, a mixed cluster.
In this article, we consider the problem of planning maintenance operations at a locomotive maintenance depot. There are three types of tracks at the depot: buffer tracks, access tracks and service tracks. A depot consists of up to one buffer track and a number of access tracks, each of them ending with one service track. Each of these tracks has a limited capacity measured in locomotive sections. We present a constraint programming model and a greedy algorithm for solving the problem of planning maintenance operations. Using lifelike data based on the operation of several locomotive maintenance depots in Eastern polygon of Russian Railways, we carry out numerical experiments to compare the presented approaches.
We apply Dempster-Shafer theory in order to reveal important elements in undirected weighted networks. We estimate cooperation of each node with different groups of vertices that surround it via construction of belief functions. The obtained intensities of cooperation are further redistributed over all elements of a particular group of nodes that results in pignistic probabilities of node-to-node interactions. Finally, pairwise interactions can be aggregated into the centrality vector that ranks nodes with respect to derived values. We also adapt the proposed model to multiplex networks. In this type of networks nodes can be differently connected with each other on several levels of interaction. Various combination rules help to analyze such systems as a single entity, that has many advantages in the study of complex systems. In particular, Dempster rule takes into account the inconsistency in initial data that has an impact on the final centrality ranking. We also provide a numerical example that illustrates the distinctive features of the proposed model. Additionally, we establish analytical relations between a proposed measure and classical centrality measures for particular graph configurations.
Over the past years, there is a deep interest in the analysis of different communities and complex networks. Identification of the most important elements in such networks is one of the main areas of research. However, the heterogeneity of real networks makes the problem both important and problematic. The application of SRIC and LRIC indices can be used to solve the problem since they take into account the individual properties of nodes, the possibility of their group influence, and topological structure of the whole network. However, the computational complexity of such indices needs further consideration. Our main focus is on the performance of SRIC and LRIC indices. We propose several modes on how to decrease the computational complexity of these indices. The runtime comparison of the sequential and parallel computation of the proposed models is also given.
Tornado prediction variables are analyzed using machine learning and decision analysis techniques. A model based on several choice procedures and the superposition principle is applied for different methods of data analysis. The constructed model has been tested on a database of tornadic events. It is shown that the tornado prediction model developed herein is more efficient than a previous set of machine learning models, opening the way to more accurate decisions.
Real-world data sets often contain mislabelled entities. This can be particularly problematic if the data set is being used by a supervised classification algorithm at its learning phase. In this case, the accuracy of this classification algorithm, when applied to unlabelled data, is likely to suffer considerably. In this paper, we introduce a clustering-based method capable of reducing the number of mislabelled entities in data sets. Our method can be summarised as follows: (i) cluster the data set; (ii) select the entities that have the most potential to be assigned to correct clusters; (iii) use the entities of the previous step to define the core clusters and map them to the labels using a confusion matrix; (iv) use the core clusters and our cluster membership criterion to correct the labels of the remaining entities. We perform numerous experiments to validate our method empirically using k-nearest neighbour classifiers as a benchmark. We experiment with both synthetic and real-world data sets with different proportions of mislabelled entities. Our experiments demonstrate that the proposed method produces promising results. Thus, it could be used as a preprocessing data correction step of a supervised machine learning algorithm.
Usually DEA methods are used for the assessment of the regions disaster vulnerability. However, most of these methods work with precise values of all the characteristics of the regions. At the same time, in real life, quite often most of the data consists of expert estimates or approximate values. In this regard, we propose to use modified DEA methods, which will take into account inaccuracy of the data. We apply these methods to the evaluation of wildfire preventive measures in regions of the Russian Federation.
The concept of conflict is one of the central in the belief functions theory. There are differences between external and internal conflicts. A new method for estimating the internal conflict is proposed and studied in this paper. This method assumes that the original body of evidence was derived from simpler evidence using some combining rule. Therefore, an internal conflict can be considered as an external conflict of decomposition of the original body of evidence. This approach is specified in the article for decomposition by the Dempster rule and decomposition by the disjunctive consensus rule. The possible limits of change of the internal conflict are found in the case of these two combining rules for single-focal (categorical) and two-focal bodies of evidence. The decomposition method is discussed in detail for the case of a universal set with two alternatives.
We consider the problem of individual manipulation under incomplete information, when voters do not know a full preference profile. Instead, voters know the result of an opinion poll (the outcome of a poll information function π, e.g. a list of scores or a set of winners). In this case, a voter has an incentive to misrepresent her preferences (π-manipulate) if she knows that she will not become worse off and there is a chance of becoming better off. We consider six voting rules and eight types of poll information functions differing in their informativeness. To compare manipulability, first we calculate the probability that there is a voter which has an incentive to π-manipulate and show that this measure is not illustrative in the case of incomplete information. Then, we suggest considering two other measures: the probability of a successful manipulation and an aggregate stimulus of voters to manipulate, which demonstrate more intuitive behavior. We provide results of computational experiments as well as analytical proofs of some effects observed.
The paper presents a polynomial-time algorithm for rescheduling traffic when one track of a double-track railway becomes unavailable, the remaining track has a siding, and there are two categories of trains—priority trains such as passenger trains and ordinary trains such as the majority of freight trains. The presented algorithm minimises the negative effect, caused by the track blockage, first for the priority trains and then for the ordinary trains on the set of all schedules optimal for the priority trains.
This paper presents a relatively rare case of an optimization problem in data analysis to admit a globally optimal solution by a recursive algorithm. We are concerned with finding a most specific generalization of a fuzzy set of topics assigned to leaves of domain taxonomy represented by a rooted tree. The idea is to “lift” the set to its “head subject” in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors, either “gaps” or “offshoots” or both. Our method globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. We apply this to a collection of 17645 research papers on Data Science published in 17 Springer journals for the past 20 years. We extract a taxonomy of Data Science (TDS) from the international Association for Computing Machinery Computing Classification System 2012. We find fuzzy clusters of leaf topics over the text collection, optimally lift them to head subjects in TDS, and comment on the tendencies of current research following from the lifting results.
In December 2019 the Basel Committee has launched the consolidated Basel framework. The framework inherits the Basel II internal ratings-based (IRB) approach for the credit risk with mostly no changes. The absence of the material methodological changes is unexpected given the fact that the key shortcomings of the IRB approach stay unresolved. The paper objective is therefore to list its earlier discussed shortcomings and to address the new ones. Latter include the unbalanced treatment of PD and LGD parameters, as well as methodological inconsistency in expected and unexpected loss treatment.
We study the world largest credit risk losses from the year of 1972 when the predecessor of the Basel Committee on Banking Supervision was established. By choosing a round threshold of current USD 100m equivalent of loss amount and the entity total assets in excess of current USD 500m as of the loss announcement date, we collected the dataset of 56 cases with the total loss of the current USD 700bn (or ca. 900 constant 2018 USD bn) which occurred during the last half of a century. The two most unexpected findings are the following. First, we verified the announced loss amounts by analysis of stock quotes dynamics around the loss announcement dates. Thus we were able to trace three cases where announced by mass media losses may seem to have been exaggerated. Second, there is a series of events when there was a disclosure combination of credit risk loss and operational one. It seems the latter might have been used to partially cover the former.
We consider an application of long-range interaction centrality (LRIC) to the problem of the influence assessment in the global retail food network. Firstly, we reconstruct an initial graph into the graph of directed intensities based on individual node’s characteristics and possibility of the group influence. Secondly, we apply different models of the indirect influence estimation based on simple paths and random walks. This approach can help us to estimate node-to-node influence in networks. Finally, we aggregate node-to-node influence into the influence index. The model is applied to the food trade network based on the World International Trade Solution database. The results obtained for the global trade by different product commodities are compared with classical centrality measures.
Using the SIPRI Arms Transfers Database covering all trade in military equipment over the period 1950–2018, we examine the relationship between countries from a novel empirical perspective. We consider the arms transfers network as a multiplex network where each layer corresponds to a particular armament category. First, we analyze how different layers overlap and elucidate main ties between countries. Second, we consider different patterns of trade in order to identify countries specializing on particular armament categories and analyze how they change their export structure in dynamic. We also examine how countries influence each other at different layers of multiplex network. Finally, we analyze the influence of countries in the whole network.