SPACEKIT Radio: scraping NASA data
by Ru Kein
spacekit.radio
This module is used to access the STScI public dataset [astroquery.mast.cloud] hosted in S3 on AWS. In this post, I’ll show you how to scrape and download NASA space telescope datasets that can be used in astronomical machine learning projects (or any astronomy-based analysis and programming). For this demonstration we’ll call the API to acquire FITS files containing the time-validated light curves of stars with confirmed exoplanets. The datasets all come from the K2 space telescope (Kepler phase 2).
Prerequisites
Creation of a virtual-env and an AWS account (use us-east-1 region) is recommended.
Install Dependencies
$ pip install spacekit[x]
Import Packages
import os
from spacekit.extractor.radio import Radio
AWS OpenData Bucket Config (optional)
To acquire data from MAST directly, no additional setup is required (though download speeds may be slower). This demo focuses specifically on downloading from the OpenData s3 bucket.
Configure AWS access using your account credentials
Before we can fetch data we need to configure our AWS credentials (this demo assumes you have already set up an account). Create a config directory and save your credentials in a file called awscli.ini. Note this mostly needed when running a notebook - within a python shell Boto3 is typically able to determine creds as soon as you’ve exported the environment variables.
os.makedirs('config', exist_ok=True)
text = '''
[default]
aws_access_key_id = <access_id>
aws_secret_access_key = <access_key>
aws_session_token= <token>
'''
path = "./config/awscli.ini"
with open(path, 'w') as f:
f.write(text)
Set the credentials via config file
Now that we have our credentials stored in a file locally, we can set the path as an environment variable and call it from within the notebook (Jupyter or Google Colab).
!export AWS_SHARED_CREDENTIALS_FILE=./config/awscli.ini
path = path
os.environ['AWS_SHARED_CREDENTIALS_FILE'] = path
Download data sets via AWS/MAST api
Kepler observed parts of a 10 by 10 degree patch of sky near the constellation of Cygnus for four years (17, 3-month quarters) starting in 2009. The mission downloaded small sections of the sky at a 30-minute (long cadence) and a 1-minute (short cadence) in order to measure the variability of stars and find planets transiting these stars. These data are now available in the public S3 bucket on AWS at s3://stpubdata/kepler/public.
These data are available under the same terms as the public dataset for Hubble and TESS, that is, if you compute against the data from the AWS US-East region, then data access is free.
This script queries MAST for TESS FFI data for a single sector/camera/chip combination and downloads the data from the AWS public dataset rather than from MAST servers.
Targets with confirmed exoplanets for K2 mission
os.makedirs('./data/mast', exist_ok=True)
os.chdir('./data/mast')
K2_confirmed_planets = ['K2-21','K2-28','K2-39','K2-54','K2-55','K2-57','K2-58','K2-59','K2-60','K2-61','K2-62','K2-63','K2-64','K2-65','K2-66', 'K2-68','K2-70','K2-71','K2-72','K2-73','K2-74','K2-75','K2-76',
'K2-116','K2-167','K2-168','K2-169','K2-170','K2-171','K2-172']
radio = Radio(config="enable")
radio.target_list = K2_confirmed_planets
radio.cone_search("0s", "K2", 1800.0, "LLC")
radio.get_object_uris()
radio.s3_download()
Download Complete
Alt: Download larger dataset of all confirmed Kepler planets using requests api from NASA
import requests
r=requests.get("https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name,k2_name+from+k2names&format=json")
results = r.json()
radio.target_list = [r['k2_name'] for r in results]
radio.get_object_uris()
radio.s3_download()
In the next several posts, we’ll use these datasets to plot light curves and frequency spectrographs then build a convolutional neural network to classify stars that host a transiting exoplanet.
Contact
If you want to contact me you can send a message on LinkedIn.
License
This project uses the following license: MIT License.