Data-Driven Astronomy

I’ve completed the Data-Driven Astronomy course by the University of Sydney in Coursera. It is a very interesting course because it mixes something I really like which is Astronomy, and my recently inspired interest of learning about Data Science and Machine Learning.

In Astronomy, there’s tons of data captured by multiple instruments and experiments, which is sometimes hard to parse or extract meaningful conclusions out of them by hand, which makes technologies that can process vast volumes of data and also learn from them highly valuable.

It is a great course and I learned a good deal of stuff not only about Astronomy but also about Python modules which I didn’t know about before!

This course teaches stuff such as:

  • Reading data from CSV, FITS, etc files and operating on such data.
  • Doing mathematical computations with NumPy.
  • Compare efficiency of algorithms such as naive mean and median and NumPy versions.
  • Implement the BinApprox algorithm, and apply it to images.
  • Use SQL to easily perform queries, aggregations, etc on structured data.
  • Use SciKit Learn to create Decision Trees to be able to predict based on features.
  • Do comparissons between what models predict and actual values.
  • Learn to prevent overfitting and biases by using K-Fold Cross Validation.
  • Create Random-Forest Classifiers with Cross Validation.

All of this applied to astronomical data for:

  • Doing computations on multiple sources of sky data e.g. find median of sky flux for each pixel.
  • Computing distances by converting degrees, RA, DEC, and using Great-Circle Distance.
  • Do interesting computations on known star catalogs AT20GBSS, SuperCOSMOS, etc.
  • Perform cross-matching of catalogs’ data, naively, with Binary Search, and with AstroPy K-D Trees.
  • Clustering kinds of stars according to redshift and astronomical color.
  • Train models based on features like astronomical colors and targets like redshifts.
  • Classify galaxies based on features like color index, eccentricity, adaptive moments, etc.

There’s plenty of results which are text only, but these are some of the nice plots generated by some of the exercises:

Right Ascension, Declination, Flux Redshift of galaxies and QSOs
07 04
Cluster stars by Color and Redshift Comparing Overfitting of Trees By Depth
01 02
Comparing measurements with predictions Confusion Matrix For Galaxy Classification
05 06