CS50 speller.c中load函数中使用哈希表时出现分段错误。

huangapple go评论74阅读模式
英文:

CS50 speller.c segmentation fault with hash table in load function

问题

我正在学习 CS50,目前在解决问题集 'speller.c'。抱歉,如果这只是一个简单的修复,但我对 C 语言的理解还很肤浅。

我已经编写了所有的函数,但当我尝试运行程序时,出现了段错误。使用 debug50 调试后,它告诉我段错误与哈希表有关,或者是一个叫做 node 的自定义数据类型的数组。Debug50 告诉我,当我尝试运行一个 if 语句时发生了段错误,即 "if (table[hashnum]->next == NULL)",这在 'load' 函数中。

我在网上查找了段错误可能发生的原因,但据我所了解,它发生在访问已释放的指针时,当我没有足够的空间时,或者当我试图访问我不允许访问的内存,或者尝试写入内存的“只读”部分。另外,根据我对初始化全局数组的了解,程序开头创建的数组应允许我在其中读取和写入,所以我不确定我做错了什么。

感谢任何帮助和解释,谢谢 CS50 speller.c中load函数中使用哈希表时出现分段错误。

以下是我的代码。

此外,问题集的指南和目标可以在 https://cs50.harvard.edu/x/2023/psets/5/speller/ 找到。

(代码部分已省略)

英文:

I'm taking CS50 and currently on problem set 'speller.c'. Sorry if this is a simple fix, but i'm just barely understanding C.

I've written all of the functions, but when I try to run the program I get a segmentation fault. After using debug50, it tells me that the segmentation fault has something to do with the hash table, or the array of a custom data type called node. Debug50 tells me that it happens when i try to run an if statement, which is "if (table[hashnum]->next == NULL)", which is in the 'load' function.

I've looked online as to why the seg fault could be happening, but from what I understand, it happens when I access freed pointers, when i dont have enough space, when i try to access memory i'm not allowed to access, or try to write in a 'read only' part of memory. Also, from what I understand about initializing global arrays, the one made in the beginning of the program should allow me to read and write in it, so I'm not sure what i'm doing wrong.

Any help and explanations are appreciated, thanks CS50 speller.c中load函数中使用哈希表时出现分段错误。

Below is my code.

Also, the guidelines and goal of the problem set are found at https://cs50.harvard.edu/x/2023/psets/5/speller/.

// Implements a dictionary's functionality

#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>

#include "dictionary.h"

// Represents a node in a hash table
typedef struct node
{
    char word[LENGTH + 1];
    struct node *next;
}
node;

// TODO: Choose number of buckets in hash table
const unsigned int N = 26;

// Hash table
node *table[N];

// Global Variables
char tmp[LENGTH + 1];
int wordcount = 0;
bool hashnull = false;

// New function prototype
void freehash(node* node);

// Returns true if word is in dictionary, else false
bool check(const char *word)
{
    // Runs word through hash function
    int hashnum = hash(word);
    node * tmpnode = table[hashnum]->next;
    // Compares all words in linked list given by hashnum to 'word' to see if any match/if 'word' is spelled correctly
    while (strcasecmp(word, tmpnode->word) != 0)
    {
        if (tmpnode->next == NULL)
        {
            return false;
        }
        tmpnode = tmpnode->next;
    }

    return true;
}

// Hashes word to a number
unsigned int hash(const char *word)
{
    // TODO: Improve this hash function
    return toupper(word[0]) - 'A';
}

// Loads dictionary into memory, returning true if successful, else false
bool load(const char *dictionary)
{
    // Opens dictionary
    FILE *dict  = fopen(dictionary, "r");
    // Checks if dictionary opened successfully
    if (dict == NULL)
    {
        fclose(dict);
        printf("\n\n Could not load dictionary.\n");
        return false;
    }
    // Load dictionary into hash table
    node *tmpnode;
    int hashnum;
    char check;
    while(fread(&check, sizeof(char), 1, dict))
    {
        fseek(dict, -1, SEEK_CUR);
        //// Take a single word from dictionary and put into dictword (new node)
        // Make new node/element to add to hash table
        node *dictword = malloc(sizeof(node));
        if (dictword == NULL)
        {
            fclose(dict);
            free(dictword);
            return false;
        }
        tmpnode = dictword;

        // Iterates through all of the letters in a word from dictionary and ends when next line (/n) is read
        int tmpctr = 0;
        while(fread(&dictword->word[tmpctr], sizeof(char), 1, dict))
        {
            if(dictword->word[tmpctr] == '\n')
            {
                break;
            }
            else
            {
                tmpctr++;
            }
        }
        // Run hash function to see where in hash table new word will go
        hashnum = hash(dictword->word);
        //// Put new node for current word into hash table
        if (table[hashnum]->next == NULL)
        {
            table[hashnum]->next = tmpnode;
            dictword->next = NULL;
        }
        else
        {
            tmpnode = table[hashnum]->next;
            dictword->next = tmpnode;
            table[hashnum]->next = dictword;

        }

        // Increase wordcount
        wordcount++;
    }
    // Close dictionary stream (after success)
    fclose(dict);
    // ***** NO NEED TO MALLOC HERE; THATS WHAT THE UNLOAD FUNCTION IS FOR
    return true;
}

// Returns number of words in dictionary if loaded, else 0 if not yet loaded
unsigned int size(void)
{
    if (wordcount > 0)
    {
        return wordcount;
    }
    else
    {
        return 0;
    }
}

// Unloads dictionary from memory, returning true if successful, else false
bool unload(void)
{
    node * tmpnode;
    for(int i = 0; i < N; i++)
    {
        tmpnode = table[i]->next;
        freehash(tmpnode);
    }
    return true;
}

void freehash(node* node)
{
    // If node currently being 'freed' isn't the last node, call freehash() with next node in line
    if (node->next != NULL)
    {
        freehash(node->next);
    }

    free(node);

    return;
}

答案1

得分: 1

  1. load(): 如果fopen()失败,你会执行fclose(NULL),导致段错误。

  2. load(): 由于node *table[N]是全局变量,它会被初始化为零。在load()函数中,你执行table[hashnum]->next,但会导致段错误,因为table[hashnum]为NULL。也许你想要这样做:

if (!table[hashnum]) {
    table[hashnum] = tmpnode;
} else if (table[hashnum]->next) {

另外,将tmpnode的作用范围最小化,只在需要它的地方使用它(即在设置->next的情况下)。这样可以使你的代码更容易阅读。

  1. load(): 目前你包括了文件中的'\n',但你可能不应该包括它,而是应该以NUL终止你的字符串:
size_t i = 0;
for(; i < LENGTH && fread(&dictword->word[i], 1, 1, dict) && dictword->word[i] != '\n'; i++);
dictword->word[i] = '
size_t i = 0;
for(; i < LENGTH && fread(&dictword->word[i], 1, 1, dict) && dictword->word[i] != '\n'; i++);
dictword->word[i] = '\0';
'
;
  1. load(): 文件读取逻辑有点奇怪,首先读取一个字节,然后将文件指针回退,然后逐个字母地读取,而不检查单词是否过长。正如@Barmar建议的,可以使用fgets()获取一行,然后使用strcspn()\n替换为\0
英文:
  1. load(): If fopen() fails you fclose(NULL) which segfaults.

  2. load(): As node *table[N] is a global variable it is zero initialized. In load() you do table[hashnum]-&gt;next which segfaults as table[hashnum] is NULL. Maybe you want:

		if (!table[hashnum]) {
			table[hashnum] = tmpnode;
        } else if (table[hashnum]-&gt;next) {

As aside, minimize the scope of tmpnode to just where it's needed (which is the case where ->next is set). This makes your code easier to read.

  1. load(): You currently include the '\n' of the file but you probably shouldn't and instead want to NUL terminate your string instead:
        size_t i = 0;
		for(; i &lt; LENGTH &amp;&amp; fread(&amp;dictword-&gt;word[i], 1, 1, dict) &amp;&amp; dictword-&gt;word[i] != &#39;\n&#39;; i++);
		dictword-&gt;word[i] = &#39;
        size_t i = 0;
for(; i &lt; LENGTH &amp;&amp; fread(&amp;dictword-&gt;word[i], 1, 1, dict) &amp;&amp; dictword-&gt;word[i] != &#39;\n&#39;; i++);
dictword-&gt;word[i] = &#39;\0&#39;;
&#39;;
  1. load(): The file reading logic is kinda odd, read one byte, then you back up the file pointer then read letter by letter without checking if word is too long. As @Barmar suggest, just get a line with fgets(), then use strcspn() to replace the \n with a \0.

答案2

得分: 0

Code also has subtle bugs beyond CS50 expectations.

OP hashed with return toupper(word[0]) - 'A';

  • This should be done as return toupper(((unsigned char *)word)[0]) - 'A'; as char may be signed with word[0] < 0 and toupper(int) is defined for unsigned char values and EOF. Characters should be examined as if there are unsigned char even when char is signed.

  • strcasecmp(), although not a standard function, more often converts to lower (than toupper()) and then compares. When the case mapping is not 1-to-1, e.g. toupper() maps ÿ and y to Y, but tolower() maps Y to y. The hash with toupper will not return 0 with strcasecmp("ÿ","y"). Best to use the same case to-ness.

英文:

Code also has subtle bugs beyond CS50 expectations.

OP hashed with return toupper(word[0]) - &#39;A&#39;;

  • This should be done as return toupper(((unsigned char *)word)[0]) - &#39;A&#39;; as char may be signed with word[0] &lt; 0 and toupper(int) is defined for unsigned char values and EOF. Characters should be examined as if there are unsigned char even when char is signed.

  • strcasecmp(), although not a standard function, more often converts to lower (than topper()) and then compares. When the case mapping is not 1-to-1, e.g. toupper() maps &#255; and y to Y, but tolower() maps Y to y. The hash with toupper will not return 0 with strcasecmp(&quot;&#255;&quot;,&quot;y&quot;). Best to use the same case to-ness.

huangapple
  • 本文由 发表于 2023年2月14日 01:28:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439278.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定