October 7, 2011

Open Tasking: Exoplanet Data in JSON Format NEEDED

I love programming. And i'm thankful when i heard that today is Ada Lovelace Day. Ada Lovelace is known to be the World’s First Computer Programmer. I was going to do something else today but Lovelace reminded me to code instead. And so I did.

I worked on some exoplanet data today, and have a goal of creating a code-accessible database of exoplanets. But then time flies so fast. I can't do it alone, but I know I belong to a hive of coders and i'm just one node among many of the programming collective. So i'd like to tap the collective and start "Open Tasking". It's like this: i'll tell you where i'm at with this self-inflicted project, and then i'll let you know what kind of help that i need on a particular task. In return, I will share what I learned in the hopes that it will benefit others.

Basically, I am setting up a CouchDB database for Exoplanets. It will be something anyone can use and replicate for any purpose. There's a ton of sub-tasks that need to be done before it becomes a reality so i am posting this as i go along. At the moment, I need help to write a script to convert the XML format of exoplanet data into JSON format so i can import them into CouchDB.
The source data can be found here: Open Exoplanet Catalogue [ https://github.com/hannorein/open_exoplanet_catalogue/tree/master/data ] and the intended destination where the exoplanet JSON will be stored will be in here Exoplanets at Cloudant [ https://cloudant.com/futon/database.html?metapsyche%2Fexoplanets/_all_docs ].
[ If you really want to take a peek at where i'm at right now feel free to check The Exoplanet Viewer. It's really nothing at this point actually, just some preliminary code ]

I've already contacted HannoRein of the Open Exoplanet Catalogue and he said he has no plans to provide the data in JSON format. Bummer. So, right now, I am trying to write javascript code to convert XML to JSON so I can automate a batch conversion of the XML exoplanet data into JSON and then load them into my CouchDB database at cloudant. Why Javascript? So i can use it with node.JS and make streamlined process to keep up with the fast-paced exoplanet updates. So if you already had experience with the task described, and you already have a working set of code, please help me.

I've done quite some research on this and i can't find any usable code at this point. The JSON output should validate at JSONLint [ http://jsonlint.com ]. At the moment, I am using this http://extjs.org.cn/xml2json/xml2json_online.php to convert XML manually. But as I said, I need a streamlined process to keep up with the rapid pace of exoplanet data growth and updates.

Links:
Exoplanets and Open Data
Open Exoplanet Data on CouchDB

5 comments:

Alain Couthures said...

My XForms implementation (XSLTForms) now supports JSON instances: an XSLT stylesheet converts the internal XML document into JSON.
Except Konqueror, every browser now has its own XSLT 1.0 engine which it very fast.
Using Javascript, there are only two different ways to program this: for IE and for others ;-)

Andrii said...

Good idea! For streamlined converting - see library http://kawa.net/works/js/jkl/parsexml-e.html

Example from docs:

XML SOURCE
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<items>
 <item>
  <zip_cd>10036</zip_cd>
  <us_state>NY</us_state>
  <us_city>New York</us_city>
  <us_dist>Broadway</us_dist>
 </item>
</items>

SCRIPT
<script type="text/javascript" src="jkl-parsexml.js"></script>
<script><
  var url = "zip-e.xml";
  var xml = new JKL.ParseXML( url );
  var data = xml.parse();
  document.write( data["items"]["item"]["us_state"] );
  document.write( data.items.item.us_state );
</script>

OUTPUT JSON
{
 items: {
  item: {
   zip_cd: "1000001"
   us_state: "NY",
   us_city: "New York",
   us_dist: "Broadway",
  }
 }
};

JavaScript maybe run with Rhino on server-side automatically, harvest xml-files and push json-data to CouchDB store.

daveryan said...

Hey,

I have knocked up a quick node.js script to load the xml files with xml2js (https://github.com/Leonidas-from-XIV/node-xml2js)

and write the files to couchdb with
cradle (https://github.com/cloudhead/cradle)

seems to work pretty nicely.

how can I contact you directly?

cheers Dave

Anonymous said...

You said you'd like to convert XML to JSON with node.js. Why not let CouchDB handle that?

The Mozilla Spidermonkey Javascript Engine that is used by CouchDB contains the E4X extension (see https://developer.mozilla.org/en/E4X), which makes parsing XML (and hence converting it to JSON) extremely easy.

Instead of doing the conversion using node.js, why don't you just put the XML as it is into CouchDB and let an update handler (http://wiki.apache.org/couchdb/Document_Update_Handlers) convert it to JSON?

metapsyche said...

Hey Daveryan, I'd love to reinvent the wheel, but you said you already had the code working with node-xml2js. You may contact me via twitter @exoplanetology or you can just paste the code in the comment section here if it fits. Thank you!