Olympic Data Analysis in PythonIntroductionThe case study uses past Olympic data to identify patterns and insights that may provide light on how the games and the athletes that compete in them have changed over time. The initial theories concerned the distribution of Body Mass Index (BMI), the trend of female involvement, and the relationship between athletes' height and weight. In order to extract, clean, and analyze data, a combination of SQL and Python was used in the data analysis process. During the analysis, a number of technical difficulties were encountered, including inconsistent and missing data, a large dataset, and intricate SQL queries. The analysis's findings supported the original theories by demonstrating a positive relationship between the height and weight of athletes and a long-term upward trend in female involvement. These findings offer insightful information on Olympic participation trends and patterns, emphasizing the value of taking into account a variety of variables in the research, including gender and physical attributes. This case study illustrates how important discoveries and insightful learnings may result from the use of SQL and other data analysis tools. This case study demonstrates the usefulness of SQL extraction and manipulation abilities in a real-world setting, which is essential for data analysts. Initial HypothesesOur study is predicated on Olympic Games historical data. Our goal was to find intriguing patterns and information that would help us understand how the sports and the athletes who play them have changed throughout time. Our first theories were:
Body Mass Index (BMI) Distribution: Given the physical demands of professional sports and the focus on fitness and health, we predicted that players' BMI would be within the normal range. Data InterpretationTo extract, clean, and analyze the data, our data analysis strategy blended Python and SQL, taking use of each language's advantages.
Missing Data: The dataset had a few entries with missing values, mostly in the weight and height columns. This was a problem because these were essential areas for our investigation. In order to resolve this, we did not include these data in some analyses where these fields were essential. Inconsistent Data: We discovered several discrepancies in the data, including differences in the Olympic Games name practices (e.g., "Summer" vs. "S"). To maintain consistency, we solved issue by standardizing the data. Source Code: Output:
Output: ['Overall', 1896, 1900, 1904, 1906, 1908, 1912, 1920, 1924, 1928, 2002, 2004, 2006, 2008, 2010, 2012, 2014 ] Output: array(['China', 'Denmark', 'Netherlands', 'Finland', 'Norway', 'Romania', 'Estonia', 'France', 'Morocco', 'Spain', 'Egypt', 'Iran', 'Bulgaria', 'Italy', 'Chad', 'Azerbaijan', 'Sudan', 'Russia', 'Argentina', 'Cuba', 'Belarus', 'Greece', 'Cameroon', 'Turkey', 'Chile', 'Mexico', 'USA', 'Nicaragua', 'Hungary', 'Nigeria', 'Algeria', 'Kuwait', 'Bahrain', 'Pakistan', 'Iraq', 'Syria', 'Lebanon', 'Qatar', 'Malaysia', 'Germany', 'Canada', 'Ireland', 'Australia', 'South Africa', 'Eritrea', 'Tanzania', 'Jordan', 'Tunisia', 'Libya', 'Belgium', 'Djibouti', 'Palestine', 'Comoros', 'Kazakhstan', 'Brunei', 'India', 'Saudi Arabia', 'Maldives', , 'Virgin Islands, British', 'Mozambique', 'Virgin Islands, US', 'Central African Republic', 'Madagascar', 'Bosnia and Herzegovina', 'Guam', 'Cayman Islands', 'Slovakia', 'Barbados', 'Guinea-Bissau', 'Timor-Leste', 'Democratic Republic of the Congo', 'Gabon', 'San Marino', 'Laos', 'Botswana', 'South Korea', 'Cambodia', 'North Korea', 'Solomon Islands', 'Senegal', 'Cape Verde', 'Equatorial Guinea', 'Boliva', 'Antigua', 'Andorra', 'Zimbabwe', 'Grenada', 'Saint Lucia', 'Micronesia', 'Myanmar', 'Malawi', 'Zambia', 'Taiwan', 'Sao Tome and Principe', 'Macedonia', 'Liechtenstein', 'Montenegro', 'Gambia', 'Cook Islands', 'Albania', 'Swaziland', 'Burkina Faso', 'Burundi', 'Aruba', 'Nauru', 'Vietnam', 'Bhutan', 'Marshall Islands', 'Kiribati'}) #country = df['region'].unique().tolist() A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy Output: Out[69]: Year 1896 120 1900 300 1904 280 1906 224 1908 322 1912 316 1920 449 1924 391 1928 356 1932 370 1936 422 1948 439 2016 973 Name: Medal, dtype: int64 Output: Year 1896 19 1900 54 1904 231 1906 23 1908 46 1912 63 1920 95 1924 99 2012 103 2016 121 Name: Medal, dtype: int64 Output: Observations:In our more in-depth investigation, we concentrated on two primary topics: the relationship between an athlete's weight and height and the historical trend in female involvement. These are the results we found: Height and Weight connection: We discovered a positive connection (about 0.66) between the height and weight of an athlete. This implies that taller athletes are often heavier, which makes sense considering the overall relationship between human body shape and height and weight. Due to the unique physical demands of each activity, this correlation may, however, change among them. Trend in Female Participation: Over time, the percentage of female athletes has clearly increased, according to our data. Female participation in the Olympic Games was quite low in the beginning, but it has gradually grown. With approximately 45% of women competing in the 2016 Rio Olympics, there has been substantial progress made toward Olympic gender equality. These more in-depth observations show important patterns and connections and offer a more sophisticated interpretation of the data. The trend in female involvement in the preliminary data, which demonstrate a notable rise over time, has previously been covered. We may make a graph that displays the proportion of female athletes in each Olympic Games edition to give a better idea of this trend. SummarySeveral important conclusions have been drawn from our examination of the Olympic Games dataset:
Next TopicPython seaborn facetgrid method |
We provides tutorials and interview questions of all technology like java tutorial, android, java frameworks
G-13, 2nd Floor, Sec-3, Noida, UP, 201301, India