BigQuery (BQ) is a Petabyte-ready proprietary Google data warehouse product. Pretty much every day in my work, I use BQ to process billions of rows of data. But sometimes, tricky situations arise. For example, what if my table has a large number of rows and columns? How does BQ deal with that?
As with all other SQL languages, besides a few syntactical tricks such as
EXCEPT, every query requires us to manually enter the names of all the desired columns. But what if I have a table with hundreds of columns, and I want to compute, say, all…
The word “quantum” is often associated with complicated equations and unintuitive physical phenomena. Yet, our world is unapologetically quantum; quantum physics governs all that exists in the Universe. This fact generates a lot of confusion amongst the general public: how can something so unintuitive — quantum physics — describe the intuitive world?
Well, one source of vexation comes from the old-school way of discussing quantum physics: that there are some magical wave-functions, such that objects can be in a mixture of two different places/situations, and that these “wave-functions” can change (or collapse) instantaneously across vast distances, when someone takes a…
BigQuery (BQ) is Google’s proprietary data warehouse product, advertised to be ready at the PetaByte (PB) scale. However, it’s not immediately obvious how to scale to PB. In fact, looking at the cost structure of BQ ($5/TB), running PB-scale analytics seems prohibitively expensive, as querying 1PB of data will quickly rack up a bill of $5000! Of course, there are dedicated slots available to keep cost down, but the question remains, how does one leverage BQ to allow lightning-fast analytics that can scale affordably to PB?
Well, the solution hinges on one crucial fact: PB-scale analytics usually only include aggregated…
“An inch of time is an inch of gold”: The transience of time has been known since time immemorial. Unlike trekking across an open plain or diving into the ocean’s depths, one can neither explore around nor remain stationary in time. With every tick of the clock, we are faced with the relentless advancement of time.
Time’s irreversibility seems contradictory: Einstein’s theory of Relativity unified space and time, and the laws of physics are (approximately) the same in all directions (including forward and backward in time). Yet, our Universe seems to have picked out a unique direction — the arrow…
I can still vividly recall the events of July 4th, 2012. The alarm clock woke me up around 3 a.m. I was slightly groggy, but I quickly collected myself, remembering the significance of the moment.
I was still a graduate student at Princeton at the time. There was a celebration “party” event that I decided not to attend (given that it was a 40-minute commute for me)—as I would later find out, I might have missed a chance to be in a few frames of the acclaimed documentary Particle Fever.
In the darkness of my room, I quietly turned on…
BigQuery provides a convenient and cheap serverless framework to run data analytics and algorithms at scale (you can sign-up for a free account here). However, its SQL frontend might seem like a rather stringent constraint.
With the power of these UDFs, one can run all sorts of massively parallelized algorithms at scale. What better way to illustrate this than running Monte Carlo (MC) simulations?
Here’s a pro-tip: BigQuery’s cost structure depends only on the amount of data queried. What about generating random numbers? It doesn’t…
From the “invisible force” to the “harbinger of chaos,” you may have heard quite a few sensational phrases describing entropy.
But what is it really? The equations—frequently misunderstood—tell a more humbling story.
In many cases, entropy doesn’t capture anything particularly deep about a physical system. In fact, it says more about our understanding of the system than the system itself.
The main punchline is:
entropy measures our ignorance of a system
Let’s dissect how and why this is the proper way to understand entropy.
Starting from the beginning, the classical definition of entropy in physics, S, is given by the…
BigQuery (BQ) has become a popular way of managing large databases and running ad-hoc queries. BQ can be very cost-efficient, as it charges by the amount of data queried ($5/TB), and not the amount of computation time. Thus, it can be far cheaper to run computations in BQ compared to running jobs on Hadoop or Spark.
However, the SQL frontend comes with restrictions. A common computational task involves creating multiple outputs. How does BQ deal with storing output data?
This can be achieved through Data Manipulation Language (DML), which allows us to create tables to store results of a computation…