DataFusion Functions
This page is generated from the Apache DataFusion project's documents:
- DataFusion Scalar Functions
- DataFusion Scalar Functions (NEW)
- DataFusion Aggregate Functions
- DataFusion Window Functions
Scalar Functions
Scalar functions operate on a single row at a time and return a single value.
Math Functions
- abs
- acos
- acosh
- asin
- asinh
- atan
- atan2
- atanh
- cbrt
- ceil
- cos
- cosh
- cot
- degrees
- exp
- factorial
- floor
- gcd
- isnan
- iszero
- lcm
- ln
- log
- log10
- log2
- nanvl
- pi
- pow
- power
- radians
- random
- round
- signum
- sin
- sinh
- sqrt
- tan
- tanh
- trunc
abs
Returns the absolute value of a number.
abs(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
acos
Returns the arc cosine or inverse cosine of a number.
acos(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
acosh
Returns the area hyperbolic cosine or inverse hyperbolic cosine of a number.
acosh(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
asin
Returns the arc sine or inverse sine of a number.
asin(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
asinh
Returns the area hyperbolic sine or inverse hyperbolic sine of a number.
asinh(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
atan
Returns the arc tangent or inverse tangent of a number.
atan(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
atan2
Returns the arc tangent or inverse tangent of expression_y / expression_x
.
atan2(expression_y, expression_x)
Arguments
- expression_y: First numeric expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
- expression_x: Second numeric expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
atanh
Returns the area hyperbolic tangent or inverse hyperbolic tangent of a number.
atanh(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
cbrt
Returns the cube root of a number.
cbrt(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
ceil
Returns the nearest integer greater than or equal to a number.
ceil(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
cos
Returns the cosine of a number.
cos(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
cosh
Returns the hyperbolic cosine of a number.
cosh(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
cot
Returns the cotangent of a number.
cot(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
degrees
Converts radians to degrees.
degrees(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
exp
Returns the base-e exponential of a number.
exp(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
factorial
Factorial. Returns 1 if value is less than 2.
factorial(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
floor
Returns the nearest integer less than or equal to a number.
floor(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
gcd
Returns the greatest common divisor of expression_x
and expression_y
. Returns 0 if both inputs are zero.
gcd(expression_x, expression_y)
Arguments
- expression_x: First numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_y: Second numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
isnan
Returns true if a given number is +NaN or -NaN otherwise returns false.
isnan(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
iszero
Returns true if a given number is +0.0 or -0.0 otherwise returns false.
iszero(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
lcm
Returns the least common multiple of expression_x
and expression_y
. Returns 0 if either input is zero.
lcm(expression_x, expression_y)
Arguments
- expression_x: First numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_y: Second numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
ln
Returns the natural logarithm of a number.
ln(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
log
Returns the base-x logarithm of a number. Can either provide a specified base, or if omitted then takes the base-10 of a number.
log(base, numeric_expression)
log(numeric_expression)
Arguments
- base: Base numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
log10
Returns the base-10 logarithm of a number.
log10(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
log2
Returns the base-2 logarithm of a number.
log2(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
nanvl
Returns the first argument if it's not NaN. Returns the second argument otherwise.
nanvl(expression_x, expression_y)
Arguments
- expression_x: Numeric expression to return if it's not NaN. Can be a constant, column, or function, and any combination of arithmetic operators.
- expression_y: Numeric expression to return if the first expression is NaN. Can be a constant, column, or function, and any combination of arithmetic operators.
pi
Returns an approximate value of π.
pi()
pow
Alias of power.
power
Returns a base expression raised to the power of an exponent.
power(base, exponent)
Arguments
- base: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
- exponent: Exponent numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
Aliases
- pow
radians
Converts degrees to radians.
radians(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
random
Returns a random float value in the range [0, 1). The random seed is unique to each row.
random()
round
Rounds a number to the nearest integer.
round(numeric_expression[, decimal_places])
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
- decimal_places: Optional. The number of decimal places to round to. Defaults to 0.
signum
Returns the sign of a number.
Negative numbers return -1
.
Zero and positive numbers return 1
.
signum(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
sin
Returns the sine of a number.
sin(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
sinh
Returns the hyperbolic sine of a number.
sinh(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
sqrt
Returns the square root of a number.
sqrt(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
tan
Returns the tangent of a number.
tan(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
tanh
Returns the hyperbolic tangent of a number.
tanh(numeric_expression)
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
trunc
Truncates a number to a whole number or truncated to the specified decimal places.
trunc(numeric_expression[, decimal_places])
Arguments
- numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
- decimal_places: Optional. The number of decimal places to
truncate to. Defaults to 0 (truncate to a whole number). If
decimal_places
is a positive integer, truncates digits to the right of the decimal point. Ifdecimal_places
is a negative integer, replaces digits to the left of the decimal point with0
.
Conditional Functions
coalesce
Returns the first of its arguments that is not null. Returns null if all arguments are null. This function is often used to substitute a default value for null values.
coalesce(expression1[, ..., expression_n])
Arguments
- expression1, expression_n: Expression to use if previous expressions are null. Can be a constant, column, or function, and any combination of arithmetic operators. Pass as many expression arguments as necessary.
Example
> select coalesce(null, null, 'datafusion');
+----------------------------------------+
| coalesce(NULL,NULL,Utf8("datafusion")) |
+----------------------------------------+
| datafusion |
+----------------------------------------+
greatest
Returns the greatest value in a list of expressions. Returns null if all expressions are null.
greatest(expression1[, ..., expression_n])
Arguments
- expression1, expression_n: Expressions to compare and return the greatest value.. Can be a constant, column, or function, and any combination of arithmetic operators. Pass as many expression arguments as necessary.
Example
> select greatest(4, 7, 5);
+---------------------------+
| greatest(4,7,5) |
+---------------------------+
| 7 |
+---------------------------+
ifnull
Alias of nvl.
least
Returns the smallest value in a list of expressions. Returns null if all expressions are null.
least(expression1[, ..., expression_n])
Arguments
- expression1, expression_n: Expressions to compare and return the smallest value. Can be a constant, column, or function, and any combination of arithmetic operators. Pass as many expression arguments as necessary.
Example
> select least(4, 7, 5);
+---------------------------+
| least(4,7,5) |
+---------------------------+
| 4 |
+---------------------------+
nullif
Returns null if expression1 equals expression2; otherwise it returns expression1.
This can be used to perform the inverse operation of coalesce
.
nullif(expression1, expression2)
Arguments
- expression1: Expression to compare and return if equal to expression2. Can be a constant, column, or function, and any combination of operators.
- expression2: Expression to compare to expression1. Can be a constant, column, or function, and any combination of operators.
Example
> select nullif('datafusion', 'data');
+-----------------------------------------+
| nullif(Utf8("datafusion"),Utf8("data")) |
+-----------------------------------------+
| datafusion |
+-----------------------------------------+
> select nullif('datafusion', 'datafusion');
+-----------------------------------------------+
| nullif(Utf8("datafusion"),Utf8("datafusion")) |
+-----------------------------------------------+
| |
+-----------------------------------------------+
nvl
Returns expression2 if expression1 is NULL otherwise it returns expression1.
nvl(expression1, expression2)
Arguments
- expression1: Expression to return if not null. Can be a constant, column, or function, and any combination of operators.
- expression2: Expression to return if expr1 is null. Can be a constant, column, or function, and any combination of operators.
Example
> select nvl(null, 'a');
+---------------------+
| nvl(NULL,Utf8("a")) |
+---------------------+
| a |
+---------------------+\
> select nvl('b', 'a');
+--------------------------+
| nvl(Utf8("b"),Utf8("a")) |
+--------------------------+
| b |
+--------------------------+
Aliases
- ifnull
nvl2
Returns expression2 if expression1 is not NULL; otherwise it returns expression3.
nvl2(expression1, expression2, expression3)
Arguments
- expression1: Expression to test for null. Can be a constant, column, or function, and any combination of operators.
- expression2: Expression to return if expr1 is not null. Can be a constant, column, or function, and any combination of operators.
- expression3: Expression to return if expr1 is null. Can be a constant, column, or function, and any combination of operators.
Example
> select nvl2(null, 'a', 'b');
+--------------------------------+
| nvl2(NULL,Utf8("a"),Utf8("b")) |
+--------------------------------+
| b |
+--------------------------------+
> select nvl2('data', 'a', 'b');
+----------------------------------------+
| nvl2(Utf8("data"),Utf8("a"),Utf8("b")) |
+----------------------------------------+
| a |
+----------------------------------------+
String Functions
- ascii
- bit_length
- btrim
- char_length
- character_length
- chr
- concat
- concat_ws
- contains
- ends_with
- find_in_set
- initcap
- instr
- left
- length
- levenshtein
- lower
- lpad
- ltrim
- octet_length
- overlay
- position
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- split_part
- starts_with
- strpos
- substr
- substr_index
- substring
- substring_index
- to_hex
- translate
- trim
- upper
- uuid
ascii
Returns the Unicode character code of the first character in a string.
ascii(str)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
> select ascii('abc');
+--------------------+
| ascii(Utf8("abc")) |
+--------------------+
| 97 |
+--------------------+
> select ascii('🚀');
+-------------------+
| ascii(Utf8("🚀")) |
+-------------------+
| 128640 |
+-------------------+
Related functions:
bit_length
Returns the bit length of a string.
bit_length(str)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
> select bit_length('datafusion');
+--------------------------------+
| bit_length(Utf8("datafusion")) |
+--------------------------------+
| 80 |
+--------------------------------+
Related functions:
btrim
Trims the specified trim string from the start and end of a string. If no trim string is provided, all whitespace is removed from the start and end of the input string.
btrim(str[, trim_str])
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- trim_str: String expression to operate on. Can be a constant, column, or function, and any combination of operators. Default is whitespace characters.
Example
> select btrim('__datafusion____', '_');
+-------------------------------------------+
| btrim(Utf8("__datafusion____"),Utf8("_")) |
+-------------------------------------------+
| datafusion |
+-------------------------------------------+
Alternative Syntax
trim(BOTH trim_str FROM str)
trim(trim_str FROM str)
Aliases
- trim
Related functions:
char_length
Alias of character_length.
character_length
Returns the number of characters in a string.
character_length(str)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
> select character_length('Ångström');
+------------------------------------+
| character_length(Utf8("Ångström")) |
+------------------------------------+
| 8 |
+------------------------------------+
Aliases
- length
- char_length
Related functions:
chr
Returns the character with the specified ASCII or Unicode code value.
chr(expression)
Arguments
- expression: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
> select chr(128640);
+--------------------+
| chr(Int64(128640)) |
+--------------------+
| 🚀 |
+--------------------+
Related functions:
concat
Concatenates multiple strings together.
concat(str[, ..., str_n])
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- str_n: Subsequent string expressions to concatenate.
Example
> select concat('data', 'f', 'us', 'ion');
+-------------------------------------------------------+
| concat(Utf8("data"),Utf8("f"),Utf8("us"),Utf8("ion")) |
+-------------------------------------------------------+
| datafusion |
+-------------------------------------------------------+
Related functions:
concat_ws
Concatenates multiple strings together with a specified separator.
concat_ws(separator, str[, ..., str_n])
Arguments
- separator: Separator to insert between concatenated strings.
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- str_n: Subsequent string expressions to concatenate.
Example
> select concat_ws('_', 'data', 'fusion');
+--------------------------------------------------+
| concat_ws(Utf8("_"),Utf8("data"),Utf8("fusion")) |
+--------------------------------------------------+
| data_fusion |
+--------------------------------------------------+
Related functions:
contains
Return true if search_str is found within string (case-sensitive).
contains(str, search_str)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- search_str: The string to search for in str.
Example
> select contains('the quick brown fox', 'row');
+---------------------------------------------------+
| contains(Utf8("the quick brown fox"),Utf8("row")) |
+---------------------------------------------------+
| true |
+---------------------------------------------------+
ends_with
Tests if a string ends with a substring.
ends_with(str, substr)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- substr: Substring to test for.
Example
> select ends_with('datafusion', 'soin');
+--------------------------------------------+
| ends_with(Utf8("datafusion"),Utf8("soin")) |
+--------------------------------------------+
| false |
+--------------------------------------------+
> select ends_with('datafusion', 'sion');
+--------------------------------------------+
| ends_with(Utf8("datafusion"),Utf8("sion")) |
+--------------------------------------------+
| true |
+--------------------------------------------+
find_in_set
Returns a value in the range of 1 to N if the string str is in the string list strlist consisting of N substrings.
find_in_set(str, strlist)
Arguments
- str: String expression to find in strlist.
- strlist: A string list is a string composed of substrings separated by , characters.
Example
> select find_in_set('b', 'a,b,c,d');
+----------------------------------------+
| find_in_set(Utf8("b"),Utf8("a,b,c,d")) |
+----------------------------------------+
| 2 |
+----------------------------------------+
initcap
Capitalizes the first character in each word in the input string. Words are delimited by non-alphanumeric characters.
initcap(str)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
> select initcap('apache datafusion');
+------------------------------------+
| initcap(Utf8("apache datafusion")) |
+------------------------------------+
| Apache Datafusion |
+------------------------------------+
Related functions:
instr
Alias of strpos.
left
Returns a specified number of characters from the left side of a string.
left(str, n)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- n: Number of characters to return.
Example
> select left('datafusion', 4);
+-----------------------------------+
| left(Utf8("datafusion"),Int64(4)) |
+-----------------------------------+
| data |
+-----------------------------------+
Related functions:
length
Alias of character_length.
levenshtein
Returns the Levenshtein distance
between the two given strings.
levenshtein(str1, str2)
Arguments
- str1: String expression to compute Levenshtein distance with str2.
- str2: String expression to compute Levenshtein distance with str1.
Example
> select levenshtein('kitten', 'sitting');
+---------------------------------------------+
| levenshtein(Utf8("kitten"),Utf8("sitting")) |
+---------------------------------------------+
| 3 |
+---------------------------------------------+
lower
Converts a string to lower-case.
lower(str)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
> select lower('Ångström');
+-------------------------+
| lower(Utf8("Ångström")) |
+-------------------------+
| ångström |
+-------------------------+
Related functions:
lpad
Pads the left side of a string with another string to a specified string length.
lpad(str, n[, padding_str])
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- n: String length to pad to.
- padding_str: Optional string expression to pad with. Can be a constant, column, or function, and any combination of string operators. Default is a space.
Example
> select lpad('Dolly', 10, 'hello');
+---------------------------------------------+
| lpad(Utf8("Dolly"),Int64(10),Utf8("hello")) |
+---------------------------------------------+
| helloDolly |
+---------------------------------------------+
Related functions:
ltrim
Trims the specified trim string from the beginning of a string. If no trim string is provided, all whitespace is removed from the start of the input string.
ltrim(str[, trim_str])
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- trim_str: String expression to trim from the beginning of the input string. Can be a constant, column, or function, and any combination of arithmetic operators. Default is whitespace characters.
Example
> select ltrim(' datafusion ');
+-------------------------------+
| ltrim(Utf8(" datafusion ")) |
+-------------------------------+
| datafusion |
+-------------------------------+
> select ltrim('___datafusion___', '_');
+-------------------------------------------+
| ltrim(Utf8("___datafusion___"),Utf8("_")) |
+-------------------------------------------+
| datafusion___ |
+-------------------------------------------+
Alternative Syntax
trim(LEADING trim_str FROM str)
Related functions:
octet_length
Returns the length of a string in bytes.
octet_length(str)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
> select octet_length('Ångström');
+--------------------------------+
| octet_length(Utf8("Ångström")) |
+--------------------------------+
| 10 |
+--------------------------------+
Related functions:
overlay
Returns the string which is replaced by another string from the specified position and specified count length.
overlay(str PLACING substr FROM pos [FOR count])
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- substr: Substring to replace in str.
- pos: The start position to start the replace in str.
- count: The count of characters to be replaced from start position of str. If not specified, will use substr length instead.
Example
> select overlay('Txxxxas' placing 'hom' from 2 for 4);
+--------------------------------------------------------+
| overlay(Utf8("Txxxxas"),Utf8("hom"),Int64(2),Int64(4)) |
+--------------------------------------------------------+
| Thomas |
+--------------------------------------------------------+
position
Alias of strpos.
repeat
Returns a string with an input string repeated a specified number.
repeat(str, n)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- n: Number of times to repeat the input string.
Example
> select repeat('data', 3);
+-------------------------------+
| repeat(Utf8("data"),Int64(3)) |
+-------------------------------+
| datadatadata |
+-------------------------------+
replace
Replaces all occurrences of a specified substring in a string with a new substring.
replace(str, substr, replacement)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- substr: Substring expression to replace in the input string. Substring expression to operate on. Can be a constant, column, or function, and any combination of operators.
- replacement: Replacement substring expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
> select replace('ABabbaBA', 'ab', 'cd');
+-------------------------------------------------+
| replace(Utf8("ABabbaBA"),Utf8("ab"),Utf8("cd")) |
+-------------------------------------------------+
| ABcdbaBA |
+-------------------------------------------------+
reverse
Reverses the character order of a string.
reverse(str)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
> select reverse('datafusion');
+-----------------------------+
| reverse(Utf8("datafusion")) |
+-----------------------------+
| noisufatad |
+-----------------------------+
right
Returns a specified number of characters from the right side of a string.
right(str, n)
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- n: Number of characters to return.
Example
> select right('datafusion', 6);
+------------------------------------+
| right(Utf8("datafusion"),Int64(6)) |
+------------------------------------+
| fusion |
+------------------------------------+
Related functions:
rpad
Pads the right side of a string with another string to a specified string length.
rpad(str, n[, padding_str])
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of operators.
- n: String length to pad to.
- padding_str: String expression to pad with. Can be a constant, column, or function, and any combination of string operators. Default is a space.