• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

The Problem of Data Binning. Comparison of approaches.

Student: Safiullina Alsu

Supervisor: Yuliana N. Tolstova

Faculty: Faculty of Sociology

Educational Programme: Bachelor

Year of Graduation: 2014

<p><strong>The problem of data binning. Comparison of different approaches.</strong></p><p>The present study refers to the problem of data binning or, in other word, data grouping. There is a collection of publications with different methods proposed, and the problem is that there is no recommendation which method in which case to use. The use of inappropriate method of data grouping may deprive the collected data of any value. That means that a researcher will waste his or her time, efforts and other resources.</p><p>The overall goal of this study was to systematize existing methods of data grouping and to provide some recommendations about which method corresponds to which researcher task.</p><p>In order to achieve this goal further tasks were set:</p><p>-classify existing methods;</p><p>- choose two methods for comparison;</p><p>- provide meaningful interpretations of methods chosen;</p><p>- imply methods on the database and compare the results;</p><p>- reveal type of tasks which can be solved using one or another method.</p><p>The subject of the study was the difference between methods compared. As the object of the study two methods of data grouping were selected.</p><p>In the study two approaches suggested by Sturges and Scott which help to group the values into equal intervals were theoretically analyzed. After that selected approaches were tried on database provided by European Social Survey community. The only limitation for that database was obligatory presence of the variable which can be grouped. After the grouping procedure different methods of data analysis were implied to that variable in discrete form and other variables in the database and the results were be compared. As the result two states were proved:</p><p>- the use of different methods of grouping lead to different partition of the values;</p><p>- the use of different methods of grouping lead to different results after data analysis.</p><p>The findings shed more light on the optimal grouping problem and set priorities for further analysis of the methods which were not considered in the work.</p>

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses