# ODBC to Parquet

[![Minimum required rust version](https://img.shields.io/badge/rust%201.51.0-min%20version-blue)](http://rust-lang.org)

A command line tool to query an ODBC data source and write the result into a parquet file.

* Small memory footprint. Only holds one batch at a time in memory.
* Fast. Makes efficient use of ODBC bulk reads, to lower IO overhead.
* Flexible. Query any ODBC data source you have a driver for. MySQL, MS SQL, Excel, ...

## Mapping of types in queries

The tool queries the ODBC Data source for type information and maps it to parquet type as such:

| ODBC SQL Type         | Parquet Type            |
|-----------------------|------------------------ |
| Decimal(p, s)         | Decimal(p,s)            |
| Numeric(p, s)         | Decimal(p,s)            |
| Bit                   | Boolean                 |
| Double                | Double                  |
| Real                  | Float                   |
| Float                 | Float                   |
| Tiny Integer          | Int8                    |
| Small Integer         | Int16                   |
| Integer               | Int32                   |
| Big Int               | Int64                   |
| Date                  | Date                    |
| Timestamp(p: 0..3)    | Timestamp Milliseconds  |
| Timestamp(p >= 4)     | Timestamp Microseconds  |
| Varbinary             | Byte Array              |
| Binary                | Fixed Length Byte Array |
| All others            | Utf8 Byte Array         |

`p` is short for `precision`. `s` is short for `scale`. Intervals are inclusive.

## Installation

### Prerequisites

To work with this tool you need an ODBC driver manager and an ODBC driver for the data source you want to access.

#### Windows

An ODBC driver manager is already preinstalled on windows. So is the `ODBC data sources (64Bit)` and `ODBC data sources (32Bit)` app which you can use to discover which drivers are already available on your system.

#### Linux

This tool links both at runtime and during build against `libodbc.so`. To get it you should install [unixODBC](http://www.unixodbc.org/). You can do this using your systems packet manager. For *ubuntu* you run:

```
sudo apt install unixodbc-dev
```

#### OS-X

This tool links both at runtime and during build against `libodbc.so`. To get it you should install [unixODBC](http://www.unixodbc.org/). To install it I recommend the [homebrew](https://brew.sh/) packet manager, which allows you to install it using:

```
brew install unixodbc
```

### Download binary from GitHub

<https://github.com/pacman82/odbc2parquet/releases/latest>

*Note*: Download the 32 Bit version if you want to connect to data sources using 32 Bit drivers and download the 64 Bit version if you want to connect via 64 Bit drivers. It won't work vice versa.

### Via Cargo

If you have a rust tool chain installed, you can install this tool via cargo.

```shell script
cargo install odbc2parquet
```

You can install `cargo` from here <https://rustup.rs/>.

## Usage

### Query using connection string

```bash
odbc2parquet query \
--connection-string "Driver={ODBC Driver 17 for SQL Server};Server=localhost;UID=SA;PWD=<YourStrong@Passw0rd>;" \
out.par  \
"SELECT * FROM Birthdays"
```

### Query using data source name

```bash
odbc2parquet query \
--dsn my_db \
--password "<YourStrong@Passw0rd>" \
--user "SA" \
out.par1 \
"SELECT * FROM Birthdays"
```

### List available ODBC drivers

```bash
odbc2parquet list-drivers
```

### List available ODBC data sources

```bash
odbc2parquet list-data-sources
```

### Use parameters in query

```shell
odbc2parquet query \
--connection-string "Driver={ODBC Driver 17 for SQL Server};Server=localhost;UID=SA;PWD=<YourStrong@Passw0rd>;" \
out.par  \
"SELECT * FROM Birthdays WHERE year > ? and year < ?" \
1990 2010
```

### Inserting data into a database

```shell
odbc2parquet insert \
--connection-string "Driver={ODBC Driver 17 for SQL Server};Server=localhost;UID=SA;PWD=<YourStrong@Passw0rd>;" \
input.par \
MyTable
```

Use `odbc2parquet --help` to see all option.

## Links

Thanks to @samaguire there is a script for Powershell users which helps you to download a bunch of tables to a folder: <https://github.com/samaguire/odbc2parquet-PSscripts>
