TLDR

Checkout https://github.com/momo1239 for proof of concept inputs.

CVE-2024-XYZA

A stack buffer overflow vulnerability exists in the charset handling functionality of html2xhtml version 1.3. An attacker can exploit this vulnerability by providing a specially crafted input, which would lead to the overflow of the ‘buf’ variable located on the stack. Successful exploitation of this vulnerability could allow an attacker to execute arbitrary code or crash the application, leading to denial of service.

Crash Out Phase

To reproduce the crash let’s go to the project website and download the latest version 1.3.

Once we have the tar file, we can run tar xvf XYZ.tar

Run:

./configure
make
./html2xhtml poc.html

You should receive a segmentation fault. Let’s analyze this with Address Sanitizer.

Run:

make clean
make CFLAGS=-fsanitize=address
./html2xhtml poc.html

Receive the output:

=================================================================
==3468537==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffde70 at pc 0x7ffff7493fc4 bp 0x7fffffffdc00 sp 0x7fffffffd3a8
READ of size 86 at 0x7fffffffde70 thread T0
    #0 0x7ffff7493fc3 in __interceptor_memmem ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:686
    #1 0x5555555f5f35 in read_charset_decl /home/kenny/Downloads/html2xhtml-1.3/src/charset.c:680
    #2 0x5555555f7d89 in guess_charset /home/kenny/Downloads/html2xhtml-1.3/src/charset.c:508
    #3 0x5555555f7d89 in charset_auto_detect /home/kenny/Downloads/html2xhtml-1.3/src/charset.c:343
    #4 0x555555568d49 in main /home/kenny/Downloads/html2xhtml-1.3/src/html2xhtml.c:100
    #5 0x7ffff7029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #6 0x7ffff7029e3f in __libc_start_main_impl ../csu/libc-start.c:392
    #7 0x55555556b914 in _start (/home/kenny/Downloads/html2xhtml-1.3/src/html2xhtml+0x17914)

Address 0x7fffffffde70 is located in stack of thread T0 at offset 544 in frame
    #0 0x5555555e86bf in read_charset_decl /home/kenny/Downloads/html2xhtml-1.3/src/charset.c:536

  This frame has 1 object(s):
    [32, 544) 'buf' (line 537) <== Memory access at offset 544 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:686 in __interceptor_memmem
Shadow bytes around the buggy address:
  0x10007fff7b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b80: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00
  0x10007fff7b90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7ba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7bb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10007fff7bc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[f3]f3
  0x10007fff7bd0: f3 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
  0x10007fff7be0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
  0x10007fff7bf0: f1 f1 00 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
  0x10007fff7c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==3468537==ABORTING

Root Cause Analysis

Let’s take a look at the source code and review the vulnerable function: read_charset_decl(). The function is over 100 lines of code. We’ll simplify it and take a look at this particular loop.

  for (i = ini, len = 0; i < avail && len < SCAN_LEN; i += step, len++) {
    buf[len] = tolower(buffer[i]);
  }

This loop copies data from the buffer array to buf, converting characters to lower case as it goes. The problem is when avail is greater than SCAN_LEN and the loop does not check for the upper bounds of buf being exceeded.

In order to mitigate there should be bounds checking to make sure len does not exceed SCAN_LEN.

Description of CVE-2024-xxxx

A stack buffer overflow vulnerability was found in internal/external_link_inline_dumper functions of wiki2md 1.0.0. The vulnerability allows an attacker to overwrite the stack buffer, potentially leading to arbitrary code execution or denial of service.

CVE-2024-xxxx

I discovered a vulnerability in wiki2md in the function internal_link_inline_dumper of dumper.c. C programs are often susceptible to memory corruption vulnerabilities which can be exploited for malicious purposes. Writing secure C code requires careful attention to detail. One way to find memory corruption vulnerability is called “fuzz testing”.

Fuzz testing is a software testing methodology where a program is supplied with invalid or random data with the intent of revealing or finding vulnerabilities and bugs. This program is used to convert mediawiki syntax docs to markdown format. During the fuzzing process, I discovered a stack based buffer overflow in internal_link_inline_dumper.

Finding the Crash Out!

Install wiki2md and compile the program.

git clone https://github.com/oelmekki/wiki2md
cd wiki2md
make

run the program with the proof of concept wiki input.

./wiki2md poc.wiki

You will receive a segfault and the program will crash.

Identifying vuln

If you attach GDB to the program and run with the malicious input. We can see the registers being overwritten by our input. Note that the return pointer was also overwritten.

gdb trace

This is most likely a stack based buffer overflow but let’s modify the makefile with ASAN to confirm the vulnerability.

Add -fsanitize=address -g to our flags and rerun the program with the poc input.

We should receive an output that confirms a buffer overflow vulnerability.

asan output

Root Cause Analysis

*
 * Generates markdown for NODE_INTERNAL_LINK.
 *
 * More parsing is done here to match the various components
 * of the links.
 */
static int
internal_link_inline_dumper (dumping_params_t *params)
{
  int err = 0;
  char link_def[MAX_LINK_LENGTH] = {0};
  char *link_ptr = link_def;
  size_t link_max_len = MAX_LINK_LENGTH;

  for (size_t i = 0; i < params->node->children_len; i++)
    {
      dumping_params_t child_params = {
        .node = params->node->children[i],
        .writing_ptr = &link_ptr,
        .start_of_buffer = params->start_of_buffer,
        .max_len = &link_max_len,
      };

      err = dump (&child_params);
      if (err)
        {
          fprintf (stderr, "dumper.c : internal_link_inline_dumper() : error while processing link content.\n");
          return 1;
        }
    }

  if (strlen (link_def) == 0)
    {
      fprintf (stderr, "dumper.c : internal_link_inline_dumper() : warning : empty link detected.\n");
      snprintf (link_def, 15, "redlink");
    }

  char *text = strstr (link_def, "|");
  char url[MAX_LINK_LENGTH] = {0};
  char escaped_url[MAX_LINK_LENGTH] = {0};
  snprintf (url, (text ? (size_t) (text - link_def) : strlen (link_def)) + 1, "%s", link_def);
  escape_url_for_markdown (url, escaped_url);

  if (text)
    text++;

  if (!text || !strlen (text))
    text = url;

  size_t out_len = strlen (text) + strlen (escaped_url) + 7;
  snprintf (*params->writing_ptr, *params->max_len, "[%s](%s.md)", text, escaped_url);
  *params->writing_ptr += out_len;
  *params->max_len -= out_len;

  return err;
}

The length of the string is being subtracted from params->max_len to record how many bytes are left available, but it does not check if this was more than the max_len. Normally, this is fine when concatenating the buffer but the subtraction is a problem because max_len is a size_t which is unsigned so if it went under 0 then it could be overflowed.