Skip to content

Weather Aware Sampling and Comparison of Climate Data: An Alternative Way to Better Climate Prediction

Weather Aware Sampling and Comparison of Climate Data: An Alternative Way to Better Climate Prediction

  • speakers: Mike Bauer (NASA GISS, NASA Postdoctoral Fellow), george tselioudis, bill rows, climate research
    • host: Claire Monteleoni, CCLS, CCLS Conference Room, Suite 850, 475 Riverside Dr.
  • abstract: Climate researchers such as myself often face a flurry of questions from friends, family and others following unusual weather events. These questions range from gleeful challenges such as "Where's global warming now?" to breathless worries of "How bad will it get?" Following a deep sigh the common retort to this is that climatologists study climate not weather and as the old saw goes "weather is not climate." Of course this is only half-true as climatologists do study weather, only statistically, and weather is indeed a main ingredient of climate, and yes, even those pesky unusual weather events contribute to it.
    • It shouldn't be surprising then to learn that climate models simulate weather. What may be surprising though is that the correctness of this simulated weather is rarely assessed directly. Instead, traditional methods of model validation rely on long-term averages. Which is consistent with the "weather is not climate" sentiment.
    • An alternative approach to model validation will be resented, one that makes use of our knowledge of weather processes, such as their patterns of occurrence, structure and behavior to test climate models in a new and informative way. In this way we can broaden traditional methods of climate model validation without replacing them, which is to say that we aim to merge the context afforded by the case-by-case perspective of the meteorologist with the statistical vantage of the climatologist. To do this we have a method for identifying, following and delimiting a target weather phenomenon (in this case mid-latitude cyclones). We will show how this tool can be used to identify specific model deficiencies as well as open up new research possibilities.


  • climate model - layers, time, variables, model experiments, models - all can be varied and compared independently
  • data - remote sensing, weather balloons, satellites
    • problems: incomplete, sparse, different time and space coverage and interval
    • problem: not forecass, just recordings
  • traditional method - averaging over time series (long-term average to observations)
    • problem: lossy compression, too often/too much, not sure why there is a difference or loss
    • solutions: trying to switch from eulerian to Lagrangian POV (i.e. following a cloud instead its space)

tool demonstration

  • atomic - sample takers that can find a specific time or place, or large domain
  • event-based extraction - using simple thresholds can find certain events
  • phenomenon based extraction - connected events, Lagrangian, specific answers (but must now answer why given too much data)
    • simple partitioning of data to quickly parse unique events

extra-tropical cycles

  • why extra-tropical cyclones as example?
    • easy to find, well understood; characteristic scale but interesting variability; imprint seen in observations and models
    • climate change, feedback and uncertainty (no analog in the past for what will happen in the future)
      • sensitivities not captured in models but need to predict these minor variables for future use
    • one example problem in large amount of data to find characteristics through this filter; example or larger generic class of problem

traditional problems

  • traditional method uses mass of features
  • new method uses mass distribution using sea-level projection
    • advantage: find low-pressure, radial events with Lagrangian ; same size/pressure calculation
  • challenge: find and track -- currently just use a filtering criterion
    • ML can do finding faster based on identifying individuals
    • minima in local pressure field; local and regional (characteristic sizes at both scale), laplacian bend of pressure field (second derivative)
    • tracking - similarity and proximity - i.e. simply computed with a general cost function
    • mostly single choice - usually just to connect or not connect a single instance, not multiple points; still a problem because some instances can merge and split (i.e. waves in a fluid, with some supervised criterion)

ML possibilities

  • cyclone attribution - potential for ML
    • partitioning - currently use largest set that encloses center (with iso-pressure line); segregate into cyclone vs. non-cyclone
    • seeded region growing - detection based on edge gradient (doesn't always work where there is an 'eye' in isomap)
      • nesting of centers can occur to distort attribution of grids to a specific depression
      • discrimination can still be achieved with time-based tracking? yes
      • also used Fourier harmonics to do merging and bifurcation (from instabilities)
  • GIS / context/ data sort - potential for ML? (probably database/retrieval)
    • find nearest cyclone given a specific sample or location (i.e. frequency or confidence)
    • could be used for feature selection or for parameter estimation -- look at 'consequence' of this tracking from real-world elements and differing/detecting model faults
    • given classification, can find feature values from these sets -- correlate label to attribute values
  • cyclone composites - quality assessments (i.e. existing cluster analysis)
    • teach/learn a behavior given these weak/moderate/strong labels or a specific place
    • models can be used to both classify and generate cyclone examples
    • develop a taxonomy/lexicon for cyclone classification
    • goal: define a few specific parameters and find main regimes of variability....
      • define in a query-based structure


  • python - open source, mostly OO, sometimes parallel (SMP - multi-core)
  • current applications - climate model validation, satellite data reduction, weather sensitive
    • ecology, oceanography, air quality, wind energy, insurance


  • run models then compare against observations?
    • climate models vs. real analysis (fill in information with a given model) - this real data can be used to find other events
    • usually climatologists would look at aggregate numbers
  • is this task model calibration?
    • yes, but also learning in general based on statistics to identify specific events and average them
  • is radius of cyclone large enough
    • break up world into general grids (geo and location); generally things aren't "resolved" until you have ~10 grids
    • granularity of grids is typically 100 km^3
  • cyclone vs. hurricane
    • some similarities, hurricanes mostly stay in tropics, cyclones only exist outside typically at 30 deg
    • hurricanes depend on sea temperature (narrow feature band), cyclones come around unstable system in gradient to the poles but it is generally a dynamical instability
    • typically when dissipating will die in the poles; always there but has a seasonal effect
  • combinatorial blow-up for filtering?
    • not exactly, choose local minima after filtering to avoid trying all possible regions
  • is this efficient? or just dynamic programming?
    • nump (python/c sorting), pretty fast to run climate all data in under an hour
  • in terms of extrapolating is it also important to find things where you don't know the exact properties
    • yes, but prior work finding these patterns is not unique; in the past was very non-robust
    • generally, these solutions will probably still be interesting for traditional meteorology discoveries; also may find patterns because more data is available now vs. then
  • below 'scale' phenomena - effects of these may be significant, but miss on data filtering by model scale
    • this problem is a little different, but ML is will suited to this problem; take highly-resolved models (more detail in physics and events) and to derive new models
  • sat data resolves to smaller scales, but could analyze empirical data instead of models?
    • possibly, but sat data is only view from top (or a slice), but may not have wind/temperature information
    • these data sources are typically incomplete, but physical model isn't really based on truth
    • do better with not constructing based on sat analysis (i.e. need other components that are contributing because observations are sparse)
  • what is the model size of data?
    • model output ~20TB, with other data, can be up to PB; about 1TB a week from "a-train"


  • parameterize - cube of 100km to discretize atmosphere and data sections
      1. single value then represents that entire region - estimation/generation could be used
  • evaluation not 'validation' - models aren't perfect, better terminology
  • examples: "a-train" follows and can give different instrument readings
    • goal: put all of these together to look at a larger region with these data segments - need to be collected/normalized

summary ideas / suggestions during 'brainstorming' -- autumn time frame

  • better ways to parameterize within the "grid boxes" - estimate processes that happen below that scale
    • vision could help to do 3d filling of these grid boxes
  • data-mining within large datasets of both model output and real data to improve predictions and model output
  • dealing with predictions of future with non-random class of models when compared to real-world; tracking?
    • for models very sensitive to reaction rates? need better estimates of reaction rates?
      • problem is that parameters within model are not real (i.e. gravity, ozone breakdown, etc)
    • distributed / parallel data mining? considered same problem


  • interactive, query-formulation systems could help?
    • may not necessarily apply to this problem
  • query system that can validate both systems (observations and model output)
    • use observation based results and try to sort through these missing parameters
    • alternative is to solve disjoint data between observed/model

interesting links and commentary

Post a Comment

Your email is never published nor shared. Required fields are marked *