What's the future of the pandas library?

Pandas is a powerful, open source Python library for data analysis, manipulation, and visualization. I've been teaching data scientists to use pandas since 2014, and in the years since, it has grown in popularity to an estimated 5 to 10 million users and become a "must-use" tool in the Python data science toolkit.
I started using pandas around version 0.14.0, and I've followed the library as it has significantly matured to its current version, 0.23.4. But numerous data scientists have asked me questions like these over the years:

  • "Is pandas reliable?"
  • "Will it keep working in the future?"
  • "Is it buggy? They haven't even released version 1.0!"
Version numbers can be used to signal the maturity of a product, and so I understand why someone might be hesitant to rely on "pre-1.0" software. But in the world of open source, version numbers don't necessarily tell you anything about the maturity or reliability of a library. (Yes, pandas is both mature and reliable!) Rather, version numbers communicate the stability of the API.
In particular, version 1.0 signals to the user: "We've figured out what the API should look like, and so API-breaking changes will only occur with major releases (2.0, 3.0, etc.)" In other words, version 1.0 marks the point at which your code should never break just by upgrading to the next minor release.
So the question remains: What's coming in pandas 1.0, and when is it coming?

Roadmap

According to the talk, here's the roadmap to pandas 1.0:

  • 0.23.4 was the most recent pandas release (August 2018).
  • 0.24 is targeted for the end of 2018, according to the GitHub milestone.
  • 0.25 is targeted for early 2019, and it will warn about all of the deprecations coming in 1.0.
  • 1.0 will be the same as 0.25, except all the deprecated features will be removed.
More details about the roadmap are available in the pandas sprint notes from July 2018, though all of these plans are subject to change.

Continue reading >>>

Comments