Skip to content

MDEV-23987 Make udf_example.c portable and warning-free#4966

Open
mertbrn wants to merge 1 commit intoMariaDB:mainfrom
mertbrn:MDEV-23987
Open

MDEV-23987 Make udf_example.c portable and warning-free#4966
mertbrn wants to merge 1 commit intoMariaDB:mainfrom
mertbrn:MDEV-23987

Conversation

@mertbrn
Copy link
Copy Markdown

@mertbrn mertbrn commented Apr 21, 2026

Fixes https://jira.mariadb.org/browse/MDEV-23987

  • Replace bzero() with memset() and strmov() with strcpy()
  • Fix length calculation in lookup() and reverse_lookup() since strcpy() returns dest (not end like strmov/stpcpy)
  • Remove redundant includes; rely on mysql.h transitively
  • Remove #define bzero/strmov shim that caused a redefinition warning
  • Document avg2, is_const, check_const_len in header comments and add their CREATE/DROP examples

Copilot AI review requested due to automatic review settings April 21, 2026 18:38
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 21, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves the portability of sql/udf_example.c by removing non-portable APIs/macros and updating documentation for the included example UDFs.

Changes:

  • Updated header documentation to describe additional UDFs and include CREATE/DROP examples.
  • Replaced bzero()/strmov() usage with standard C equivalents (memset()/strcpy()), and fixed result-length calculations accordingly.
  • Removed several includes/shims in an attempt to rely on mysql.h transitively.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sql/udf_example.c Outdated
Comment on lines 142 to 152
#ifdef STANDARD
/* STANDARD is defined, don't use any mysql functions */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#ifdef _WIN32
typedef unsigned __int64 ulonglong; /* Microsofts 64 bit types */
typedef __int64 longlong;
#else
typedef unsigned long long ulonglong;
typedef long long longlong;
#endif /*_WIN32*/
#else
#include "mariadb.h"
#include <my_sys.h>
#if defined(MYSQL_SERVER)
#include <m_string.h> /* To get strmov() */
#else
/* when compiled as standalone */
#include <string.h>
#define strmov(a,b) stpcpy(a,b)
#define bzero(a,b) memset(a,0,b)
#endif
#endif
#include <mysql.h>
#include <ctype.h>

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing the previous include block, this file no longer includes headers that define the APIs/types it uses (eg. strcpy/memset/strlen/memcpy/malloc/free/sprintf/isalpha, pthread_mutex_t/pthread_mutex_* and MY_MUTEX_INIT_SLOW, plus typedefs like uint/uchar from my_global.h). <mysql.h> does not provide these, so this change will reintroduce build failures or implicit-declaration warnings on stricter compilers. Please add the required standard headers (at least <string.h>, <stdlib.h>, <stdio.h>, <ctype.h>) and either include the needed server headers (eg. sql/mariadb.h / my_global.h and my_pthread.h) or replace the internal typedefs/macros with portable equivalents (eg. use unsigned int/unsigned char and pass NULL to pthread_mutex_init).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 970a251: Added my_global.h/my_pthread.h back

Comment thread sql/udf_example.c Outdated
Comment on lines +798 to +800
memcpy(&in, *hostent->h_addr_list, sizeof(in.s_addr));
*res_length= (ulong) (strmov(result, inet_ntoa(in)) - result);
strcpy(result, inet_ntoa(in));
*res_length= (ulong) strlen(result);
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookup() copies inet_ntoa(in) into result with strcpy and then calls strlen(). However lookup_init() sets initid->max_length=11, while an IPv4 string can be up to 15 chars (plus '\0'), so this can overflow the server-provided result buffer. Please increase max_length to at least 15 and use a bounded copy (eg snprintf/strncpy with initid->max_length+1), or switch to inet_ntop which also avoids inet_ntoa's static-buffer/thread-safety issues.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 970a251: Raised initid->max_length to 15 (max IPv4 dotted-quad length) and replaced the unbounded strcpy + strlen with a bounded memcpy capped at initid->max_length, followed by explicit NUL-termination.

Comment thread sql/udf_example.c Outdated
Comment on lines +903 to +904
strcpy(result, hp->h_name);
*res_length= (ulong) strlen(result);
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse_lookup() copies hp->h_name into result with strcpy and then uses strlen(). reverse_lookup_init() sets initid->max_length=32, but hostnames can exceed 31 characters, so this can overflow the result buffer. Please either raise max_length to a safe value (eg 255) and/or use a bounded copy based on initid->max_length (ensuring NUL-termination) before computing *res_length.

Suggested change
strcpy(result, hp->h_name);
*res_length= (ulong) strlen(result);
length= (uint) strlen(hp->h_name);
if (*res_length)
{
if (length >= (uint) *res_length)
length= (uint) *res_length - 1;
memcpy(result, hp->h_name, length);
result[length]= 0;
}
else
length= 0;
*res_length= (ulong) length;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 970a251: Raised initid->max_length to 255 (POSIX HOST_NAME_MAX) and replaced strcpy(result, hp->h_name) with a bounded memcpy capped at initid->max_length, followed by explicit NUL-termination before setting *res_length.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sql/udf_example.c Outdated
Comment on lines +904 to +907
pthread_mutex_unlock(&LOCK_hostname);
#endif
*res_length=(ulong) (strmov(result,hp->h_name) - result);
length= (uint) strlen(hp->h_name);
if (length > initid->max_length)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the non-reentrant path, LOCK_hostname is released immediately after gethostbyaddr(), but hp->h_name is then read afterwards. gethostbyaddr() typically returns pointers to static storage that can be overwritten by another thread right after the unlock, so strlen(hp->h_name)/the subsequent copy can race and return corrupted data. Keep the mutex held until after you've copied hp->h_name into result (or use the reentrant API path consistently).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4bde17b: moved unlock past the memcpy in both lookup() and reverse_lookup() so the hostent data is copied under the lock.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

sql/udf_example.c:745

  • LOCK_hostname is a single static mutex shared by both lookup/reverse_lookup, but it is initialized (and later destroyed) from lookup_init() which is invoked per SQL statement. This can lead to double-initialization and destroy-while-in-use races when multiple statements (or both UDFs) run concurrently. Prefer a static initializer (e.g., PTHREAD_MUTEX_INITIALIZER) / pthread_once for one-time init and avoid destroying it, or allocate a per-instance mutex stored in initid->ptr.
  initid->max_length=15;              /* "xxx.xxx.xxx.xxx" */
  initid->maybe_null=1;
#if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
  (void) pthread_mutex_init(&LOCK_hostname,MY_MUTEX_INIT_SLOW);
#endif

sql/udf_example.c:833

  • LOCK_hostname is a single static mutex, but reverse_lookup_init() also calls pthread_mutex_init() (and reverse_lookup_deinit() destroys it) on a per-statement basis. This can race with lookup_* and other concurrent statements, causing undefined behavior. Use a one-time static initialization (or pthread_once) and avoid destroying the global mutex, or make the lock instance-specific via initid->ptr.
  initid->max_length=255;             /* POSIX host name maximum */
  initid->maybe_null=1;
#if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
  (void) pthread_mutex_init(&LOCK_hostname,MY_MUTEX_INIT_SLOW);
#endif

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sql/udf_example.c
Comment on lines +864 to 867
snprintf(result, initid->max_length + 1, "%d.%d.%d.%d",
(int) *((longlong*) args->args[0]),
(int) *((longlong*) args->args[1]),
(int) *((longlong*) args->args[2]),
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reverse_lookup()'s string-argument path (the else branch after the 4-argument case), the current bounds logic can increase length beyond args->lengths[0] (e.g., when length == *res_length - 1, it sets length = *res_length), which can make memcpy() read past the end of the input buffer. Clamp length to the minimum of the input length and available space (leaving room for the NUL terminator), and avoid underflow if *res_length is 0/1.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4cecd59

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

sql/udf_example.c:745

  • LOCK_hostname is a process-global mutex, but it’s being initialized in lookup_init() and destroyed in lookup_deinit(). These init/deinit hooks run per-statement (and can run concurrently across sessions), so this can lead to double-init/destroy or destroying the mutex while another thread is using it (undefined behavior / crashes). Prefer static initialization (e.g., PTHREAD_MUTEX_INITIALIZER) or pthread_once() and do not destroy it in per-statement deinit; alternatively make the mutex per-UDF instance (store it in initid->ptr).
#if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
  (void) pthread_mutex_init(&LOCK_hostname,MY_MUTEX_INIT_SLOW);
#endif

sql/udf_example.c:833

  • reverse_lookup_init() also calls pthread_mutex_init(&LOCK_hostname, ...) on the same static mutex used by lookup(). If both UDFs are used in the same server (or concurrently), this can result in concurrent/double initialization. Use a single one-time initialization strategy (pthread_once or static initializer) shared by both functions.
#if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
  (void) pthread_mutex_init(&LOCK_hostname,MY_MUTEX_INIT_SLOW);
#endif

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sql/udf_example.c
Comment on lines 751 to 753
#if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
(void) pthread_mutex_destroy(&LOCK_hostname);
#endif
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destroying LOCK_hostname in lookup_deinit() is unsafe because other sessions/statements may still be inside lookup()/reverse_lookup() holding or about to lock the same global mutex. Avoid pthread_mutex_destroy() here; keep the mutex alive for the process lifetime (or refcount it) to prevent use-after-destroy races.

Copilot uses AI. Check for mistakes.
Comment thread sql/udf_example.c
Comment on lines 839 to 841
#if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
(void) pthread_mutex_destroy(&LOCK_hostname);
#endif
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse_lookup_deinit() destroys the shared LOCK_hostname mutex, which can race with other threads executing lookup()/reverse_lookup() or with lookup_deinit() also destroying it. Don’t destroy a shared mutex in per-statement UDF deinit; keep it for the lifetime of the loaded UDF library or refcount it across users.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same pre-existing mutex lifecycle issue as the _init threads. Out of scope for MDEV-23987. Happy to file a follow-up MDEV.

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Apr 22, 2026
Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

Please always keep a single commit in your pull requests and make sure it has a commit message that's compliant with the MariaDB coding standards.

Comment thread sql/udf_example.c
*null_value= 1;
return 0;
}
pthread_mutex_unlock(&LOCK_hostname);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have you moved the lock boundary?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted in the squashed commit. The unlock is back to immediately after gethostbyname/gethostbyaddr. I had moved it on a Copilot suggestion about hostent's static buffer being overwritten after unlock, but it's out of scope for this MDEV.

Comment thread sql/udf_example.c Outdated
memcpy(result,args->args[0],length);
result[length]=0;
length= (uint) args->lengths[0];
if (length >= (uint) *res_length) /* leave room for NUL */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have you changed that? The maximum length you can store is *res_length (including the termination 0). length is sans the termination zero. hence the -1 in the original code.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted. The original length >= *res_length - 1 / length = *res_length logic is restored.

Comment thread sql/udf_example.c Outdated
}
#endif
length= (uint) strlen(hp->h_name);
if (length > initid->max_length)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will again need to account for the terminating 0 here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the squashed commit. The hostname copy now uses if (length >= initid->max_length) length = initid->max_length - 1; so the NUL terminator fits.

Replace the bzero/strmov macro shim with direct memset()/strcpy()
calls and simplify the include block: always include <my_global.h>,
<my_pthread.h>, the standard C headers actually used, and <mysql.h>.
This removes the "bzero" redefinition warning triggered by the
previous STANDARD-vs-non-STANDARD ifdef split.

Fix the result-length computation in lookup() and reverse_lookup():
strcpy() returns the destination pointer, so the original
"strcpy(result, ...) - result" expression always evaluated to 0,
making both functions return empty strings.

Raise initid->max_length to 15 in lookup() and to 255 in
reverse_lookup(), and switch the dotted-quad path to snprintf. In
reverse_lookup()'s hostname copy, clamp the copy length to
initid->max_length - 1 so the NUL terminator fits.

Document avgcost, avg2, is_const and check_const_len in the header
comment block and add their CREATE/DROP FUNCTION examples.
Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please stand by for the final review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

5 participants